o-william-white / skim2mito

A snakemake pipeline for the batch assembly, annotation, and phylogenetic analysis of mitochondrial genomes from genome skims
MIT License
7 stars 5 forks source link

Multiple read file pairs per sample #9

Open martinstervander opened 1 year ago

martinstervander commented 1 year ago

Since a single sample (specimen) is often divided into multiple libraries with different UMIs, even if they are all pooled and sequenced together, this means that reads will be demultiplexed into UMI-specific libraries. Thus, a single sample may be represented by an number of paired R1+R2 fastq files. Most often in this case, one would want to use all reads from the different libraries and combine them into a single assembly.

This can be solved by concatenating R1s and R2s before running the pipeline, but it would be a lot more efficient and user-friendly if a sample csv in which the same sample ID occurred over multiple lines (with different R1 and R2 fastq files) were processed together, resulting in a single output representing all libraries of that sample.