nhoffman / dada2-nf

A Nextflow pipeline for processing 16S rRNA sequences using dada2
0 stars 2 forks source link

Incorporate cutadapt (or equivalent tool), add adaptor/primer seqs to parameter inputs #55

Closed dhoogest closed 1 year ago

dhoogest commented 1 year ago

As a complimentary strategy for some amplicons (those where variable length is expected in particular), it would be beneficial to add an option to remove primer/adaptor sequences. Since this functionality isn't directly supported in dada2, including a tool such as cutadapt (recommended by dada2 authors and me) would be a nice addition.

See also our motivation in the SRSLY context here: https://gitlab.labmed.uw.edu/molmicro/ops/-/issues/1038

nhoffman commented 1 year ago

@crosenth - let's scope this out

crosenth commented 1 year ago

https://cutadapt.readthedocs.io/en/stable/guide.html

dhoogest commented 1 year ago

Working on the mechanics of this a bit with some manual cutadapt testing today. I think we'll need to insert cutadapt step after barcodecop, to account for scenarios where the adaptor detection truncates a full read from the fastq. In the scenario where a zero-length read is present in the fastq, barcodecop returns a malformed error. In the scenario where only non-zero lenght seqs are retained in a cutadapt-filtered fastq, the seq.id values no longer match between the R1/R2 file and it's index.