Closed anands-repo closed 4 years ago
Hi @anands-repo
There are a large number of differences in the error profile of WGS and WES, including capture kit efficiency, additional errors from PCR in preparation, differences in the amount of on-target reads, greater coverage variability in exomes, and probably many other factors that are not completely understood. This paper: https://www.pnas.org/content/112/17/5473 is probably a good place to start on some of the factors that differ between the assays.
For deduplication, we use Picard MarkDuplicates as run by GATK. We observe only very negligible differences in variant call quality with and without MarkDuplicates, which only become observable at lower coverages (15x-22x). This is one reason we indicate MarkDuplicates as an optional step in our BestPractices.
Thanks for the material and the answers!
I wonder what is the error model difference between WGS and WES. Is it simply the way coverage varies, or is there any difference in the error rates/types?
Also for the open training data, I noticed that the BAM files are named *deduplicated.bam. What is the method used to do mark duplication? Is it GATK?