open2c / coolpuppy

A versatile tool to perform pile-up analysis on Hi-C data in .cool format.
MIT License
77 stars 11 forks source link

[Q]question about normalization #122

Closed jiangshan529 closed 1 year ago

jiangshan529 commented 1 year ago

Hi, I downloaded a .hic data from 4DN database and start downstream analysis. By looking at the github tutorial and some other's questions, it seems normalization step is critical in pileup analysis. It seems using 'juicer pre' already created different sets of normalization matrix, so I did not do any other normalization and use the .hic data I downloaded for pileup analysis. Will this be a big deal? Thanks!

Phlya commented 1 year ago

It should be fine. Make sure you specify the appropriate weight name when using coolpuppy, default is "weight" as generated by cooler, but in .hic files there might be multiple normalization vectors with different names.

jiangshan529 commented 1 year ago

It should be fine. Make sure you specify the appropriate weight name when using coolpuppy, default is "weight" as generated by cooler, but in .hic files there might be multiple normalization vectors with different names.

Hi, Ilya. Thanks for your reply! Now I have two different datasets with different sequencing depth, should I normalize the two matrices to the same depth and perform pileup analysis or is it ok to run pileup directly since the final value is always small digital values(seems have some kind of normalization).

Phlya commented 1 year ago

Sorry, forgot to reply! Generally, pileups are very robust to differences in sequencing depth in most cases. So usually you can ignore it, unless visibly one of the pileups is much more noisy than the other.

You can also easily downsample a cooler using cooltools and compare, just to be sure.