feat: handle umis - Githubissues

FelixMoelder commented 1 year ago

Until now UMIs where only supported by adding a fastq file containing the UMI of each read. Often UMIs do not exists as separate fastq records but as part of the read sequences. To handle UMIs properly information about them is now stored in two additional columns in the samplesheet.

umi_read: Defines whether UMIs are part of records in fq1 or fq2.
umi_read_structure: The template of the read defining the position of UMIs in records (see https://github.com/fulcrumgenomics/fgbio/wiki/Read-Structures)

Handling UMIs is optional. In case the umi_read column is missing or left empty UMIs will not be annotated for duplicate marking or consensus read calculation.

FelixMoelder commented 1 year ago

We should discuss how this is joined with the separate umi fastq case. How is that configured now?

This still works. In case we have a separate fastq file with umis one can just define that file and set the read structure to +M which defines the whole sequence in the fastq being the UMI.

johanneskoester commented 1 year ago

We should discuss how this is joined with the separate umi fastq case. How is that configured now?

This still works. In case we have a separate fastq file with umis one can just define that file and set the read structure to +M which defines the whole sequence in the fastq being the UMI.

Can you update config/README.md to describe all ways to configure UMIs please?

snakemake-workflows / dna-seq-varlociraptor

feat: handle umis #213