nservant / HiC-Pro

HiC-Pro: An optimized and flexible pipeline for Hi-C data processing
Other
382 stars 183 forks source link

Filter diagonal short range contact #497

Open yancychy opened 2 years ago

yancychy commented 2 years ago

Hi, In HiC-Pro, it can automatically detect the valid pairs and remove Self circles. (https://github.com/nf-core/hic/blob/master/docs/output.md). In the paper (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0831-x), "However, one way to filter out artifacts such as self-ligation is to discard intra-chromosomal pairs below a given distance threshold [4]. HiC-Pro therefore allows these short range contacts to be filtered out" . Hence, is it essential to filter diagonal short range contact in HiC-Pro when using DpnII restriction enzymes? Thanks very much.

nservant commented 2 years ago

Hi, HiC-Pro can use two different ways to remove non-valid ligation product. If you are using a restriction enzyme, you can generate in silico the expected restriction fragment, and reconstruct the ligation product. Thus, you should be able to detect dangling-ends, self-circle, etc. and to remove them. However, if you are using a protocol which is not based on a restriction enzyme, such as DNAse Hi-C or MicroC. In this case, you can use the distance between the reads the filter out the contacts. In practice, it should allow you to filter most of the non-valid pairs. But it does not mean that you will remove the diagonal of your matrix ? it will depend on how you set up the minimum distance between reads to keep the pair. Best N

yancychy commented 2 years ago

Thanks @nservant. I guess it's better to set up a minimum distance between the reads to remove some short range pairs.