Closed mhoban closed 2 months ago
I think the command-line options for this ought to be restructured as well. Rather than just --illumina-demultiplexed
(which I always thought was a cumbersome option) or not, we should have:
--demultiplexed-by <val>
Where val can be one of
barcode
index
combined
indices
option, assume that all fastq files already represent individual samples.
barcodes
option, annotate samples in fastq file(s) using barcode file(s), then split annotations by sample
combined
option, primer barcode combos are reused across different index pairs
This issue will also deal with making sure process execution is streamlined, a la #89
This is sort of becoming a catch-all issue, but I discovered I needed to add --fastq_qmax to the vsearch process with AVITI sequences because the qualities were too high. I'm recording that here because it won't get mentioned in the commit message.
I'm also addressing #75 here.
This should now be functioning, pending testing with data other than the somewhat messed-up dataset I had access to.
As written, we currently either expect combined samples multiplexed using barcoded primers OR samples separated using Illumina indices. It turns out that sometimes people do both things, pooling barcoded samples together into units delineated using Illumina indices (and reusing barcodes across index pairs). While this can get pretty sticky if folks are not extremely careful with their sample tracking, it seems to be a valid use case and the pipeline should support it.