Open ryandward opened 1 year ago
This may be easy to overlook, but it is documented in the section about demultiplexing:
Paired-end demultiplexing always uses the adapter matches of the first read to decide where a read should be written. If adapters for read 2 are given (
-A
/-G
), they are detected and removed as normal, but these matches do not influence where the read pair is written.To demultiplex using a barcode that is located on read 2, you can “cheat” and swap the roles of R1 and R2 for both the input and output files
cutadapt -e 1 -g ^file:barcodes.fasta -o trimmed-{name}.2.fastq.gz -p trimmed-{name}.1.fastq.gz input.2.fastq.gz input.1.fastq.gz
If you do this in a script or pipeline, it may be a good idea to add a comment to clarify that this reversal of R1 and R2 is intended.
I wonder how to improve this. I think the first step is to print a warning when trying to demultiplex and no adapters were provided for R1.
Thanks for the clarification. I finally did wind up cheating by swapping the reads, which is totally okay by the way. I think your proposal for a warning makes a lot of sense.
My script will then have to swap the reads back from [2,1] to [1,2] in a downstream process, but it's not at all broken.
Cheers.
I’ll also see whether I can make it so that you can specify {name2}
as a template variable. That would then explicitly declare that you want to use the matches on R2 for demultiplexing. Then no file swapping is necessary.
It doesn't seem that switching -g to -G, and also switching the order of the reads results in the same results.
This results in everything going to files with
unknown
in the variable portion of the name.However, when I switch around
-g
to-G
, and switch places ofread1
andread2
, adapter trimming works as expected.❯ conda --version conda 22.11.1 ❯ cutadapt --version 4.3