jpuritz / dDocent

a bash pipeline for RAD sequencing
ddocent.com
MIT License
53 stars 41 forks source link

remove duplicates prior to variant calling #78

Closed pdimens closed 3 years ago

pdimens commented 3 years ago

I've been reading up on Freebayes and some best-practices and found several recommendations (1,2) for using samtools markdup on alignment .bam files prior to variant calling. By doing so, freebayes will skip over alignments marked as duplicates.

Is there a reason dDocent doesn't perform this post-alignment processing step?

jpuritz commented 3 years ago

This type of duplicate detection will not work for any ddRAD libraries. It relies solely on the mapping coordinates, with any PE reads having the same start and end coordinates being called as duplicates. When all your reads start at two RE sites, this would mark nearly all reads as duplicates.

pdimens commented 3 years ago

Ah, gotcha. Thanks for the explanation