Closed fergsc closed 3 years ago
Hi @fergsc The PCR duplication provides limited information for Hi-C scaffolding and may even introduce noisy signals. Therefore, it is better to remove these reads before running ALLHiC pipeline.
Thanks, I shall remove them during alignment.
@tangerzhang Based on the github sourcecode, it's now commented, right? So, you mean, regardless of dioploid genome or polyploid, it is recommended to discard duplicated reads, right? But I was confused that the author's message on the source code is saying that it would not. "NOTE: As of August 24, 2013, I'm no longer removing PCR duplicates..."
Hi @theshowmustgolangon I apologize for my confusing answer. As far as I know, many sequencing companies adopt a PCR-free approach to constructing libraries, and theoretically, there should be few PCR duplicates in the sequencing data, which may have limited influence on the results of Hi-C scaffolding. Yes, we can remove these PCR duplicated reads but it is possibly not necessary as the recently sequenced reads do not contain many PCR duplicates.
Hi, I am attempting to use the ALLHiC pipeline on a diploid highly repetitive plant genome I have assembled. My Hi-C contains a LOT of PCR duplication, is it recommended to remove these reads before running the ALLHiC pipeline?
thanks.