omegahh / DeepHiC

A GAN-based method for Enhancing Hi-C data
MIT License
28 stars 8 forks source link

Questions about downsampling #6

Open amssljc opened 3 years ago

amssljc commented 3 years ago

Hi, If I downsample with the ratio 1/16, the max value will also be as 1/16 fold as original matrix. However, the max value of Figure 2a in your paper is 210+, while the max value of downsampled matrix is 50+. The ratio is about 1/4, which is not consistent with 1/16. Could you please help me? I'm confused a lot! Thanks!

omegahh commented 1 year ago

Hello, amssljc, All 'max value' actually are 99-percentiles, not the truly the max value in matrices. the max value of original matrix could be more than 10^5 (especially for values nearby the diagonal). And when we perform 1/16 downsampling process, we are sampling Hi-C read count without repeating. Considering that the whole matrix A is very sparse and long genome-distance read counts are very small values, some A{ij}s are easily downsampled to zero values. Above all, the 99-percentile could not be simply estimated by the downsampling ratio.