robinweide / GENOVA

GENome Organisation Visual Analytics
GNU General Public License v3.0
68 stars 15 forks source link

calculating insulation score get inconsistant results #350

Open jiangshan529 opened 5 months ago

jiangshan529 commented 5 months ago

Hi, I am using GENOVA to calculate insulation score. the code is

hic <- load_contacts(signal_path = './hela_s3_insitu_hic_hg38_4DNFIAS8LV1C_5k.cool') insulation <- insulation_score(hic, window = 25 ) data <- setNames( insulation$insula_score[-4], c("chrom", "start", "end", "value") ) data <- data[is.finite(data$value),] gr <- with(data, GRanges(chrom, IRanges(start + 1, end))) score(gr) <- data$value export.bedGraph(gr, here::here("my_bedgraph.bedGraph")).

Previously, when I use window 25 for resolution of 5k, it always gives me narrow windows of insulation score switching between negative and positive values. But this time when I use the same code for another file of 5k insulation(around 500M reads), it gives me very wide window, and the most negative values cannot align to TAD boundaries in hic map. Could you please give me some hints on what's the problem here?

image the first track is the insulation score this time, the second track is previous calculated insulation score. Thanks!

teunbrand commented 5 months ago

Is this specific to this region or genome wide? Also have you looked at the contact maps themselves and do they noticably differ near the diagonal?

jiangshan529 commented 5 months ago

Is this specific to this region or genome wide? Also have you looked at the contact maps themselves and do they noticably differ near the diagonal?

It's genome-wide. And I extracted matrix from some regions and plot the hic heatmap, it looks alright.

teunbrand commented 5 months ago

So the only difference between the top and bottom is the data itself, right? No differences in data resolution, preprocessing, analysis code such as parameters, versions of software, etc?

jiangshan529 commented 5 months ago

So the only difference between the top and bottom is the data itself, right? No differences in data resolution, preprocessing, analysis code such as parameters, versions of software, etc?

yeah, for the ins calculation itself, I also tested the code with my previous data, there's no difference. However, I also cannot find what' the difference for the data preprocessing itself. I use bwa mem to map, then pairtools for contact matrics, and then juicertools to generate .hic file, then hic2cool to extract 5kb cool file, then cooler balance to normalize the data, then use GENOVA to calculate ins. I also tried not to use cooler balance, but it still gives me the same wired ins value.

teunbrand commented 5 months ago

I sympathise with the detective work, but if this is a data issue and not a software issue, there is little I can do. There isn't anything obvious in the tracks that I recognise as a particular problem with a clear solution.