open2c / cooltools

The tools for your .cool's
MIT License
140 stars 51 forks source link

dist_bp and contact_frequency fields missing from CVD function. #512

Closed ratheraarif closed 7 months ago

ratheraarif commented 8 months ago

Hi,

Thank you for making such a wonderful package. I have an issue! when running the cvd function from the cooltools, it does not generate the dist_bp and contact_freqeuncy fields. Although I can generate the dist_bp by multiplying the dist by resolution. I don't know how to generate the contact_frequency field.

Kindly help!

image
gfudenberg commented 8 months ago

It seems you have an older version of cooltools-- please update to the latest 0.6.1 and let us know if the issue is resolved! https://pypi.org/project/cooltools/

ratheraarif commented 8 months ago

Even after updating to the latest version, the problem is still unsolved.

image
gfudenberg commented 8 months ago

Any guesses @Yaoyx ?

In the meanwhile, you can use the balanced.avg column, and using aggregate_smoothed=True is generally useful!

Yaoyx commented 8 months ago

Any guesses @Yaoyx ?

In the meanwhile, you can use the balanced.avg column, and using aggregate_smoothed=True is generally useful!

I just found the package has not been updated on PyPI yet

image
Yaoyx commented 8 months ago

Hi @ratheraarif, just in case you want to use the latest features now, you can git clone the repo and in the downloaded repo usepython setup.py install to manually install the latest package.

ratheraarif commented 7 months ago

Thank you!

Phlya commented 7 months ago

The package has been updated on PyPI, so I am assuming this is solved now. Feel free to reopen in the issue persists.

fengchuiguo1994 commented 1 month ago

I had the same problem. For me, when I remove "chr", generate hic file, convert into mcool, balanced, it's OK When I remain "chr", generate hic file, convert into mcool, balanced, it's bad, it well not have dist_bp and contact_frequency.

for short 4dn format 0 chr3 1000 0 1 chr3 2000 1 or 0 3 1000 0 1 3 2000 1 (use this)

for 4dn format readid chr3 1000 chr3 2000 - + or readid 3 1000 3 2000 - + (use this)

This phenomenon occurs in multiple versions.

fengchuiguo1994 commented 1 month ago
cooltools expected-cis hic2cool.mcool::/resolutions/1000 --nproc 5 -o test.1k.tsv
sed '1d' test.1k.tsv | awk '$5!="nan"' | awk '{sum+=$5}END{print sum}'    # 6.78395

why the total contact_frequency is 6.78395 not 1

Yaoyx commented 1 month ago

Hi,

Here's the definition of contact_frequency in the table:

contact_freq: The "most processed" contact frequency value. For example, if balanced & smoothing then this will return the balanced.avg.smooth.agg; if aggregated+smoothed, then balanced.avg.smooth.agg; if nothing then count.avg.