Closed ForestCat1 closed 2 years ago
Thanks for reporting. It turns out that this was a bug. The reported error counts actually show the total number of errors (errors in the 5' adapter plus errors in the 3' adapter). This was of course not the intention and I have fixed it now.
Regarding --nextseq-trim
: This option only accepts a single threshold because trimming the 5' end with the algorithm specific to dark-cycles does not make too much sense. To trim the 5' end with the regular algorithm and the 3' end with the NextSeq algorithm, you can indeed use --nextseq-trim=10 -q 15,0
, but testing it just now, I see that the reported statistics will be incorrect here as well. I’ve opened #616 for that.
Just writing to let you know that replying to a closed issue will not re-open it.
Hi, Marcel, thanks for developing this efficient tool! I'm using Cutadapt 3.7 (installed by conda) with Python 3.8.12, since the structure of my library is
5'-P5ADAPTER-RECsite1-xxx-RECsite2-P7ADAPTER-3' 3'-P5ADAPTER-RECsite1-xxx-RECsite2-P7ADAPTER-5' (sequenced on Illumina NovaSeq, paired-end 150 bp, REC=restriction enzyme cutting, 6 bp for each RECsite)
I ran Cutadapt with following parameters to trim 3'- low quality bases, remove the 5'- RECsite and 3'-RECsite+ADAPTER:
--no-indels --nextseq-trim=10 -O 9 (I have noticed the default setting of -O is 3, and -O has no effect on anchored adapter, so I added the fixed length of RECsite) -a ^RECsite1...RECsite2AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -A ^RECsite2...RECsite1AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -o /path/out1.fq.gz -p /path/out2.fq.gz -m 30 /path/Read1.fq.gz /path/Read2.fq.gz
Then I got this report: Processing reads on 1 core in paired-end mode ... === First read: Adapter 1 ===
Sequence: RECsite1...RECsite2AGATCGGAAGAGCACACGTCTGAACTCCAGTCA; Type: linked; Length: 6+39; 5' trimmed: 32999357 times; 3' trimmed: 602797 times
Minimum overlap: 6+9 No. of allowed errors: 0
No. of allowed errors: 1-9 bp: 0; 10-19 bp: 1; 20-29 bp: 2; 30-39 bp: 3
Overview of removed sequences at 5' end length count expect max.err error counts 6 32999357 8110.2 0 32981415 14556 2551 835
Overview of removed sequences at 3' end (omitted) === Second read: Adapter 2 ===
Sequence: RECsite2...RECsite1AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT; Type: linked; Length: 6+39; 5' trimmed: 32912355 times; 3' trimmed: 613241 times
Minimum overlap: 6+9 No. of allowed errors: 0
No. of allowed errors: 1-9 bp: 0; 10-19 bp: 1; 20-29 bp: 2; 30-39 bp: 3
Overview of removed sequences at 5' end length count expect max.err error counts 6 32912355 8110.2 0 32888926 20516 2388 525
Overview of removed sequences at 3' end (omitted)
I wonder why the removed sequences at 5' end can have non-zero error counts, as the max.err for 5'-anchored RECsite is 0.
By the way, as a "by-product", the polyGs (reads only containing G) reported by fastqc (overrepresented sequence) seemed to be effectively removed since they do not have anchored 5'-RECsite, but actually I have not do 5'- quality trimming this time.
I'm not sure whether the "--nextseq-trim=" can be used as "-q" when I want to perform both 5'- and 3'- quality trimming with NovaSeq data (e.g., --nextseq-trim=15,10). If I can't specify 2 numbers in "--nextseq-trim", will "-q 15,0 --nextseq-trim=10" work well? Because I want to perform 5'- and 3'- quality trimming in one Cutadapt run.
Yours, sincerely