nservant / HiC-Pro

HiC-Pro: An optimized and flexible pipeline for Hi-C data processing
Other
386 stars 182 forks source link

Normalisation of contact matrix by sequencing depth #378

Closed Ferossitiziano closed 3 years ago

Ferossitiziano commented 4 years ago

Hi!

I was wondering if there's a point in the HiCpro pipeline where the number of interactions that are reported is normalised by the total number of reads that have been used as input for each sample.

In my WT the number of reads are three times higher than in my KO sample. I was wondering If the pipeline takes care of that or if I have to apply some normalisation at the end of the process, in order to compare WT and KO.

Thank you,

Federico

nservant commented 4 years ago

Hi Frederico No, HiC-pro does not make any normalisation by sequencing depth. This is something you should do by yourself during downstream analysis Best

biozzq commented 3 years ago

Dear @nservant

Does ICE normalization take the sequencing depth into account? Thank you very much.

Best wishes, Zheng zhuqing

nservant commented 3 years ago

Hi, The goal of the ICE method is to iteratively normalize the data so that the sum of each genomic bin is equal to a constant. In the original ICE paper (Imakaev et al. 2012), this constant is set to 1. In the iced python package, if I remember correctly, the constant is set to a mean signal. This is something you can easily check by loading your contact maps in R or python and compute the sum of raws (genome-wide) Thus, to me, there is no normalization by sequencing depth per se and I think this is something you should do by yourself before downstream analysis. best

rikrdo89 commented 3 years ago

Hi Nicolas, To follow up on these points, if the goal of an experiment is to compare changes across two or more samples processed using HiC-Pro, are the ICE-normalized matrices enough to plot and "see/compare" changes or do you suggest further processing the matrices before visually inspecting any changes across samples? Is read depth a critical factor to take into account after ICE-normalization for inter-sample comparison? If yes, would you recommend throwing out reads (so each sample has the same number of raw reads) before running HiC-pro or is it recommended to normalize the samples by mapped reads after running HIC-Pro?

nservant commented 3 years ago

Hi The ICE normalized data does not take into account the sequencing depth. Actually, if I'm not wrong (but you could double check that), the sum of interaction of the ICE matrice should be the same than the sum of interaction of the raw matrix. Thus, if you want to compare two contact maps, I would suggest to add an additional normalization step on the total number of reads, for instance, by simply transforming the counts to 1Million ... Best Nicolas