open2c / coolpuppy

A versatile tool to perform pile-up analysis on Hi-C data in .cool format.
MIT License
77 stars 11 forks source link

[Q] Setting --mindist parameter as 0, but not contain all number of loop(input loop.bedpe) #145

Open Miracle-Yao opened 4 months ago

Miracle-Yao commented 4 months ago

State the question A clear and concise description of the question.

What have you tried?

Additional context Add any other context about the problem here.

Hi, a small question.

The size of the loops in my input bedpe file ranges from 20kb to 2Mb, I have set --mindist to 0, why is the Total number of piled up windows still less than the number of loops. How to plot a piled up graph using all the loops?

thanks.

Phlya commented 4 months ago

If some of your loops are too close to ends of the chromosomes (so that the snippet with the loop would extend beyond the start/end of the chromosome), they will also be ignored. Maybe that's the reason?

Miracle-Yao commented 4 months ago

Wow, thanks for your quick reply.

If I want to compare the loop strengths of the two groups, would choosing the default --mindist (2*pad+2) and one of the BALANCE methods (GW_SCALE, KR, SCALE, VC, VC_SQRT, weight) be the best match?

Phlya commented 4 months ago

Generally mindist can be set to 0 in practice, in most datasets just --ignore-diags 2 is good enough to remove very short range artifacts. If your data can't support that you'll see some noise in the bottom left corner of the pileup.

Assuming by "weight" you mean the output of cooler balance, it's the safest option. Default filters in cooler remove more artefacts, while juicer is very lenient and a lot of bins with extreme coverage variation are retained. Also, fyi, when looking into loops, using mapq>30 filtering is quite important.