Closed dhoogest closed 1 year ago
@crosenth - let's scope this out
Working on the mechanics of this a bit with some manual cutadapt testing today. I think we'll need to insert cutadapt step after barcodecop, to account for scenarios where the adaptor detection truncates a full read from the fastq
. In the scenario where a zero-length read is present in the fastq, barcodecop returns a malformed error. In the scenario where only non-zero lenght seqs are retained in a cutadapt-filtered fastq, the seq.id
values no longer match between the R1/R2 file and it's index.
As a complimentary strategy for some amplicons (those where variable length is expected in particular), it would be beneficial to add an option to remove primer/adaptor sequences. Since this functionality isn't directly supported in dada2, including a tool such as cutadapt (recommended by dada2 authors and me) would be a nice addition.
See also our motivation in the SRSLY context here: https://gitlab.labmed.uw.edu/molmicro/ops/-/issues/1038