open2c / coolpuppy

A versatile tool to perform pile-up analysis on Hi-C data in .cool format.
MIT License
77 stars 11 forks source link

how to plot short and long contact frequency #111

Closed BenxiaHu closed 1 year ago

BenxiaHu commented 1 year ago

Hello, I was just wondering whether coolpup.py can make the APA for short and long range contact frequncy. please see the following picture.

image

would you like to explain what the [--mindist MINDIST] [--maxdist MAXDIST] are? Are both compatible with [--flank FLANK] ? Best,

Phlya commented 1 year ago

Yes, you can simply use --by-distance argument for this!

Or indeed run the same command a few times separately with different --mindist --maxdist arguments, in base pairs. They are completely independent of --flank.

Phlya commented 1 year ago

See here https://coolpuppy.readthedocs.io/en/latest/Examples/Walkthrough_CLI.html#by-distance-pileups

BenxiaHu commented 1 year ago

See here https://coolpuppy.readthedocs.io/en/latest/Examples/Walkthrough_CLI.html#by-distance-pileups

thanks. coolpup.py test.mcool::resolutions/10000 - \ --features_format bed --by_distance --by_strand --expected test_expected_cis.tsv \ --ignore_diags 0 --view hg38_arms.bed --flank 300000 --mindist 100000 --maxdist 102400000 \ --outname bydistance_CTCF_pileup_bystrand_expected.clpy --nproc 2

Why does that command output too many columns for each group?

image
BenxiaHu commented 1 year ago

Yes, you can simply use --by-distance argument for this!

Or indeed run the same command a few times separately with different --mindist --maxdist arguments, in base pairs. They are completely independent of --flank.

for short range, –by_distance 1000000 2000000 should work. long range: what I want to plot is from 3Mb to the maximum distance (I do not know the actual number)

Phlya commented 1 year ago

The command in the tutorial also splits by strand, so it creates a lot of pileups at the same time. You don't need that in this case.

You can choose any distances you want, you can do smth like –by_distance 1000000 2000000 300000000 to go up to 300 Mb, that would for sure cover everything (in "normal" mammalian genomes). Just give it a number longer than the longest chromosome, basically.

Phlya commented 1 year ago

Assuming this is clear, feel free to reopen!