nservant / HiC-Pro

HiC-Pro: An optimized and flexible pipeline for Hi-C data processing
Other
386 stars 182 forks source link

Multiple Pairs #51

Closed ustcahwry closed 8 years ago

ustcahwry commented 8 years ago

Hi, When I was using mergeSAM.py to merge the two mapped reads file and get the pairing statistics, the results showed my pairs are 80% multiple pairs which is not reasonable. And I checked the script and found that " read.is_unique" function is only for bowtie2 mapped files. Since I was using SNAP to align the reads but not Bowtie2, could you please help me that is there an another option to find multiple pairs? Thanks a lot!

Best

nservant commented 8 years ago

Hi, Yes indeed I developed this script for bowtie2. If you want to use another mapper, you will have to update the script. I think that most of the functions are common to many mappers, but some of them, as the multiple hits, are specific. Or, maybe you can try to switch of the RM_MUTLI option from the config file. In this case, multiple hits should be conserved. I guess the stat file will still be wrong, but at least the reads should not be discarded for the downstream analysis. N

ustcahwry commented 8 years ago

Hi, Thanks a lot for you kindly advice. I've solved the multiple hits problem. But after I merged the valid pairs and quality controls, I found that there was about 89% PCR duplicates in my validpairs of Hi-C library. image Could you please help me about this? Great thanks

nservant commented 8 years ago

humm ... duplicates are detected from the list of valid 3C products. All interactions where read1 and read2 mapped exactly at the same position are considered as duplicates. So duplicates are usually PCR artefacts and we advice to remove them. Do you have any biological/technical explanation first ? Then, if you really want to keep up, just turn off the option RM_DUP=0 ... Nicolas