UMI data is sometimes stored in a third FASTQ file, typically because the UMI is embedded in the index and bcl2fastq2 cannot combine it into forward/reverse. This produces three files:
R1: Forward
R2: UMI
R3: Reverse
We can support UMI processing using this method. Key points:
Allow the samplesheet to include optional 3rd FASTQ
Ignore/handle UMI FASTQ when sending to modules that expect 2 FASTQ files to ensure compatibility
Allow/enforce --umi_read_structure to support >2 masks.
Nice, yes I think this is already in the planning as soon as we have the shared subworkflow. ( I think there is a PR in modules repo somewhere, we should cross check it includes this)
Description of feature
UMI data is sometimes stored in a third FASTQ file, typically because the UMI is embedded in the index and bcl2fastq2 cannot combine it into forward/reverse. This produces three files: R1: Forward R2: UMI R3: Reverse
We can support UMI processing using this method. Key points:
--umi_read_structure
to support>2
masks.See https://github.com/nf-core/fastquorum/pull/11 for an example implementation. Could be implemented as part of NF-Core subworkflow.