molgenis / VaSeBuilder

Validation Set Builder
GNU Lesser General Public License v3.0
1 stars 3 forks source link

Duplicate fastq records. #43

Closed TDMedina closed 5 years ago

TDMedina commented 5 years ago

Need to uniq-ify the list of donor reads after each sample is processed.

Duplicate reads can still exist if, for example, a read in a variant context has a mate mapping to a different chromosome in a second variant context. In this case, the mate will also be fetched as a read in that variant context, and the read pair will be added twice.

Because reads are currently only checked for uniqueness per variant context, these duplicates are not caught.

TDMedina commented 5 years ago

Duplicates are now checked and removed before writing fastq files.