handling of 0 in cooler/cooltools

kimj50 commented 3 years ago

Hi, Thank you for the cool tools. I am trying out the tools to analyze hic matrix but also new to computers.. I recently ran into a problem where .cool file converted from .h5 (hicexplorer) having 0 values for nan. submitted issue: https://github.com/deeptools/HiCExplorer/issues/734 This was 'solved' by substituting nan for 0 in each stack of the pileup matrix. But now I'm concerned that those 0, which should have been nan, might be affecting the expected diagonal. I was wondering if there is a quick function I can run on python or commandline to substitute 0 with nan in cool file (and whether this would be a correct strategy). Thank you, Jun

Phlya commented 3 years ago

Sorry for the late reply.

I'm not sure what HiCExplorer does when it creates cool files, but normally NaNs appear when regions are masked during matrix normalization. Then the pixels with NaN weight in either bin is ignored in all analyses, and treating it as non-NaN and equal to 0 would indeed break the assumptions in computation of expected: it doesn't count NaNs at all, while still includes 0 into calculation of the average interaction frequency.

You shouldn't just substitute one with another, since a lot of the zeros in your data might be (are) real zeros - i.e. no interactions observed, but region not masked. And why you want to have NaN as the weights of your masked regions is also explained above (note that just having NaNs instead of the actual stored values will not work either AFAIK, since cooltools count NaNs based on weights, not the actual stored interaction frequencies, which are expected to not contain NaNs).

Phlya commented 3 years ago

FYI looking at your HiCExplorer issue: I would recommend importing the juicer output using cooler cload pairs, and then balancing using cooler balance, to avoid any issues.

kimj50 commented 3 years ago

Thanks!, As you suggested, we ended up completely switching to cooler, which solved all the problems. Also hicexplorer seems compatible with cool format, which allows access to both tools with minimal problems

open2c / cooltools

handling of 0 in cooler/cooltools #253