Open ddepierre opened 1 year ago
These are good questions, although there is a lot to unpack here. I think the most important idea I can give you is to calculate coverage with ignore_diags=0
, I think it should solve all your problems...
The values are indeed stored in the file, and .sample()
uses the cis count when it is available in the file. When not, it calculates the coverage with default arguments (! perhaps not ideal and should be changed to ignore_diags=0 !) and uses that.
discussion 2023/10/9:
perhaps we should deprecate the usage of stored cis counts in sample()
, as it is not obvious how to link this stored value with the number of diagonals ignored in previous coverage()
calculation.
https://github.com/open2c/cooltools/blob/0a0d8417099f182e1e24b3897fdf41de6b08844a/cooltools/api/sample.py#L100
Hi, A part of this bug has already been reported here, I have the same issue on cooltools 0.5.4 version.
Also while writing this issue, I realized I had maybe too many questions for one post, sorry about that.
I have some cool files I want to downsample to a given cis contact count for all my replicates and different conditions to have the same number of cis contacts, so I can compare them (first of all, am I right on the usage of downsampling as a way to normalize sample coverage between conditions?)
1/ get the cis contact count
With cooltools.coverage() or with expected_cis()
Counts doubled in cov_cis_raw as already reported here
2 / Downsampling
I have unbalanced cool file as
That I want to downsample:
- Is it possible to sample contact on raw matrices? What is the point of doing downsampling after balancing?
Also
But ignore_diags is not an argument from sample()
- To bypass the balancing in sample(), can I duplicate 'cov_cis_raw' columns in .bins() and call it 'weight' so it fakes balance cool file and downsample on raw cis contacts? And then I'll need to balance it anyway after the downsampling.
- Usually, is downsampling done on full matrix or do you ignore first(s) diagonal for some reason?
- additional question:
store (bool, optional) – If True, store the results in the input cooler file when finished. Does it mean that the result is stored in the python variable or directly modified in the cool file? I am not sure to understand whether the matrices are loaded and only loaded version is modified or if the original file it-self is modified.
Thanks for the support, David