Closed cpavloud closed 1 week ago
For each of the commands that you listed, either the 5' adapter or the 3' adapter is removed from each read (depending on which one matches better) because Cutadapt by default removes only a single adapter from each read. (This holds even for paired-end reads: It will remove at most one from R1 and at most one from R2.)
You should probably use linked adapters if you want to remove a 5' and a 3' adapter at the same time.
So the first command would look something like this:
cutadapt \
--pair-adapters \
-a ^TGGTGCATGGCCGTTCTTAGT...GGTCTGTGATGCCCTTAGATG \
-A ^CATCTAAGGGCATCACAGACC...ACTAAGAACGGCCATGCACCA \
-o 18S_R1_001.fastq \
-p 18S_R2_001.fastq \
--untrimmed-output R1_untrimmed_18S.fastq \
--untrimmed-paired-output R2_untrimmed_18S.fastq \
input_R1_001.fastq \
input_R2_001.fastq
However, you can also use Cutadapt’s ability for demultiplexing to simplify the three commands into a single one (untested):
cutadapt \
--pair-adapters \
-a 18S=^TGGTGCATGGCCGTTCTTAGT...GGTCTGTGATGCCCTTAGATG \
-A ^CATCTAAGGGCATCACAGACC...ACTAAGAACGGCCATGCACCA \
-a COI=^GGWACWGGWTGAACWGTWTAYCCYCC...TGRTTYTTYGGNCAYCCNGARGTNTA \
-A ^TANACYTCNGGRTGNCCRAARAAYCA...GGRGGRTAWACWGTTCAWCCWGTWCC \
-a ITS=^CTTGGTCATTTAGAGGAAGTAA...GCATCGATGAAGAACGCAGC \
-A ^GCTGCGTTCTTCATCGATGC...TTACTTCCTCTAAATGACCAAG \
-o "output_{name}_R1.fastq" \
-p "output_{name}_R2.fastq" \
input_R1_001.fastq \
input_R2_001.fastq
This will then create six output files output_18S_R1.fastq
, output_COI_R1.fastq
and so on and even output_unknown_R1.fastq
and output_unknown_R2.fastq
for the reads without a match.
Hi @marcelm
thanks for your suggestions regarding this issue. Just a brief question:
When specifying for example -a COI=^GGWACWGGWTGAACWGTWTAYCCYCC...TGRTTYTTYGGNCAYCCNGARGTNTA \ for forward reads, I am wondering if this creates a new problem. If there is no read through during MiSeq paired-end 300bp sequencing because the amplicon is longer than 300bp, then the reverse complement of the reverse primer (the second sequence specified here) will not be found in the forward reads. Does that mean then that the read will be discarded or written to the unknown.fastq file, because both specified primer sequences need to be found in a read for being considered for pairing and trimming?
Thanks,
Nauras
For exactly the reason you mention (because the read may not be short enough), the second specified sequence is actually optional. So whether a read is considered to be untrimmed or not is entirely determined by the 5' adapter only, but the 3' adapter will still be trimmed if found.
In slightly more detail: When you use -a
to specify a linked adapter, a constituent adapter that is anchored is also considered to be required (and a non-anchored one is considered optional). The documentation has a bit more detail (but it may not be 100% clear on this): https://cutadapt.readthedocs.io/en/latest/guide.html#linked-adapters-combined-5-and-3-adapter
thanks so much for the quick reply. had a look again at the documentation and the issue is actually described properly there.
thanks Nauras
And on the other hand, if you wanted to require both the forward and reverse primers be present, you'd use -g
instead of -a
Thanks Peter, good to know.
Just as a little hint when using the following demultiplexing command:
cutadapt \
--pair-adapters \
-a 18S=^TGGTGCATGGCCGTTCTTAGT...GGTCTGTGATGCCCTTAGATG \
-A ^CATCTAAGGGCATCACAGACC...ACTAAGAACGGCCATGCACCA \
-a COI=^GGWACWGGWTGAACWGTWTAYCCYCC...TGRTTYTTYGGNCAYCCNGARGTNTA \
-A ^TANACYTCNGGRTGNCCRAARAAYCA...GGRGGRTAWACWGTTCAWCCWGTWCC \
-a ITS=^CTTGGTCATTTAGAGGAAGTAA...GCATCGATGAAGAACGCAGC \
-A ^GCTGCGTTCTTCATCGATGC...TTACTTCCTCTAAATGACCAAG \
-o "output_{name}_R1.fastq" \
-p "output_{name}_R2.fastq" \
input_R1_001.fastq \
input_R2_001.fastq
In our case, the demultiplexing resulted in quite a few reads containing sequences with a length of zero bp which created some problems for downstream analysis. Guess it would be a good idea to add
- m 1
to the command (or whatever length specification toher than 1) to discard zero bp reads.
Cheers Nauras
This appears to have been resolved, closing.
Hello!
I am using cutadapt to demultiplex my reads (v2.10 with Python 3.7.6).
Basically, I have a pair of fastq files (e.g. input_R1_001.fastq and input_R2_001.fastq) which contain sequences from three genes (18S, COI and ITS) and I want to create a pair of fastq files for each gene.
Here is an example of the command I am using: `cutadapt --pair-adapters -g TGGTGCATGGCCGTTCTTAGT -a GGTCTGTGATGCCCTTAGATG -G CATCTAAGGGCATCACAGACC -A ACTAAGAACGGCCATGCACCA -o 18S_R1_001.fastq -p 18S_R2_001.fastq input_R1_001.fastq input_R2_001.fastq --untrimmed-output R1_untrimmed_18S.fastq --untrimmed-paired-output R2_untrimmed_18S.fastq
cutadapt --pair-adapters -g GGWACWGGWTGAACWGTWTAYCCYCC -a TGRTTYTTYGGNCAYCCNGARGTNTA -G TANACYTCNGGRTGNCCRAARAAYCA -A GGRGGRTAWACWGTTCAWCCWGTWCC -o COI_R1_001.fastq -p COI_R2_001.fastq input_R1_001.fastq input_R2_001.fastq --untrimmed-output R1_untrimmed_COI.fastq --untrimmed-paired-output R2_untrimmed_COI.fastq
cutadapt --pair-adapters -g CTTGGTCATTTAGAGGAAGTAA -a GCATCGATGAAGAACGCAGC -G GCTGCGTTCTTCATCGATGC -A TTACTTCCTCTAAATGACCAAG -o ITS_R1_001.fastq -p ITS_R2_001.fastq input_R1_001.fastq input_R2_001.fastq --untrimmed-output R1_untrimmed_ITS.fastq --untrimmed-paired-output R2_untrimmed_ITS.fastq `
The problem is that there are still primers found in the output files, so they are not completely trimmed/removed by cutadapt. Do you have any idea why this might be happening?