snakemake-workflows / dna-seq-varlociraptor

A Snakemake workflow for calling small and structural variants under any kind of scenario (tumor/normal, tumor/normal/relapse, germline, pedigree, populations) via the unified statistical model of Varlociraptor.
MIT License
82 stars 38 forks source link

feat: handle umis #213

Closed FelixMoelder closed 1 year ago

FelixMoelder commented 1 year ago

Until now UMIs where only supported by adding a fastq file containing the UMI of each read. Often UMIs do not exists as separate fastq records but as part of the read sequences. To handle UMIs properly information about them is now stored in two additional columns in the samplesheet.

Handling UMIs is optional. In case the umi_read column is missing or left empty UMIs will not be annotated for duplicate marking or consensus read calculation.

FelixMoelder commented 1 year ago

We should discuss how this is joined with the separate umi fastq case. How is that configured now?

This still works. In case we have a separate fastq file with umis one can just define that file and set the read structure to +M which defines the whole sequence in the fastq being the UMI.

johanneskoester commented 1 year ago

We should discuss how this is joined with the separate umi fastq case. How is that configured now?

This still works. In case we have a separate fastq file with umis one can just define that file and set the read structure to +M which defines the whole sequence in the fastq being the UMI.

Can you update config/README.md to describe all ways to configure UMIs please?