open2c / coolpuppy

A versatile tool to perform pile-up analysis on Hi-C data in .cool format.
MIT License
77 stars 11 forks source link

Extracting locus strength scores for loops #143

Open NMaziak opened 5 months ago

NMaziak commented 5 months ago

Hello,

I'm doing aggregate loop analysis and I was wondering if you have something like this for loops? I'm not very well versed in python, but will try on my own over the week.

I do think something that would be top notch for this tool is if it has options in the common line that allow you to extract regions used (for example when running something like --by-distance) in the aggregate with the scores of the region - like a bed file or txt file output.

Thanks again for developing such a nice tool! Best, Noura

efriman commented 5 months ago

Hi Noura,

I'm not sure how you mean exactly but if you want the distribution of contact scores between regions you can run coolpup.py with --store_stripes which saves individual coordinates and values. It's a bit convoluted to extract the values but I have an example here: https://github.com/efriman/Friman_etal_ULI/blob/main/pileups/TSS_quartiles_score_stripes.ipynb

There's actually a command --out_sorted_bedpe in plotpup.py which will output a bedpe file with your stripes sorted by either central pixel (--stripe_sort center_pixel) or sum (--stripe_sort sum). See here also: https://coolpuppy.readthedocs.io/en/latest/Examples/Walkthrough_CLI.html#stripe-stackups

NMaziak commented 5 months ago

Hello,

Thanks for the reply! I'll read up and test out both. I don't know how I overlooked the --out_sorted_bedpe, sorry about that. I guess I have just been a bit confused about some of the results. I've used coolpup.py to look at the general trends I have in my called loops

        "coolpup.py {input.mcool}::resolutions/{wildcards.res} "
        "{input.loops} "
        "--features_format bedpe "
        "--expected {input.expected} "
        "--mindist 2000 "
        "--by_distance 0 100000 500000 2000000 "
        "--ignore_diags 0 " #changing this value has little to no effect
        "--nproc {threads} "
        "--flank {wildcards.flank} "
        "--outname {output}"

I'm getting different trends based on the resolution I use. I wanted to see if this was a general trend in the scores being averaged or it was some loci which were causing trouble. I attached a picture. For example loops separated by distances 0.1-0.5 Mb decrease till sample 4 at 500 bp resolution, but when looking at 2 kb resolution, they stop decreasing in strength right after sample 1 (mostly).

These scores are based on the mean of the central 3x3 square of the pileup, correct? Could increasing this help? --center in plotpup.py Best, Noura coolpuppy

efriman commented 5 months ago

Hi Noura, You can of course change the --center option and see if that makes a difference and generate stripes to get distributions of values at different resolutions etc. But mostly what you are asking relates to your data, i.e. your called loops and you coolers, not coolpup.py. If you have any technical questions about how to use coolpup.py we are happy to help of course!