tfwillems / HipSTR

Genotype and phase short tandem repeats using Illumina whole-genome sequencing data
GNU General Public License v2.0
94 stars 31 forks source link

Illumina adapter trimming #60

Closed tfwillems closed 5 years ago

tfwillems commented 5 years ago

I've added functionality that automatically searches for and trims common Illumina adapter sequences from the alignments prior to genotyping. The trimming accounts for the orientation of the original read, such that adapters are appropriately trimmed from the correct end of the alignment using either the original Illumina adapter sequence or its reverse complement as appropriate.

This functionality is key to improving genotyping accuracy in datasets where adapter contamination is fairly high (> 1%). In these datasets, the adapter sequences are aligned as insertions in the flanking sequences, resulting in high DFLANKINDEL counts and occasionally causing the assembly of the flanking sequences to fail.

By performing the trimming by default, HipSTR should be more robust to this fairly common sequencing artifact.