Closed HenriettaHolze closed 1 year ago
For each read, Cutadapt does two things:
--too-long-output
).The filtering criteria are always checked in a fixed order. (The order in which the command-line options are listed under the "Filtering of processed reads" heading in the cutadapt --help
output).
The "Read fate breakdown" section in the log shows how many reads made it how far through those filtering steps:
== Read fate breakdown == Reads that were too short: 52 (0.0%) Reads that were too long: 452,806 (14.0%) Reads with too many N: 225 (0.0%) Reads discarded as untrimmed: 0 (0.0%) Reads written (passing filters): 2,778,159 (86.0%)
If a filtering criterion applies, the read is discarded and does not continue to the other filters. And this is what is happening here:
Your maximum length filtering criterion is -M 60
, but also your input reads have a length of 75 bp. This means that an untrimmed read will always be filtered by the "too long" filter because that is applied before the trimmed-only filter.
Thanks a lot for the clarification!
Hi Marcel,
I don't understand the classification of reads as "too-long".
In the log file, 14% of my reads are too long but none are discarded as untrimmed.
When writing untrimmed and "too long" sequences to file, none are written to untrimmed.
The sequences written to "too long" don't contain any adapter sequence.
Thus, I don't understand why they are flagged as "too long".
Sequences are ultimately trimmed correctly but for QC purposes, I would like to understand the reasoning better.
Thanks a lot!
I'm using cutadapt v4.4 installed with conda.
The command I'm running:
The beginning of the log file:
Sequences that are written to
too_long.fastq
(sample)Cheers, Henrietta