Lasse Krøger Eliassen asked about the correct way to implement forward read trimming and filtering, as described in Minardi et al. 2021.
Forward reads were trimmed to 200 bp in length approximately corresponding to the point at which the lower quartile fell below 20. Low quality reads were removed when estimated errors were greater than two and truncated if quality scores fell below two.
Lasse Krøger Eliassen asked about the correct way to implement forward read trimming and filtering, as described in Minardi et al. 2021.
The proposed implementation is correct:
It could be improved as such:
--fastx_filter
over--fastq_filter
(more generic),--fastq_trunclen_keep
is not more suited than--fastq_trunclen
(keep reads that are shorter, if any),Finally, the
--fastq_truncqual
value is dataset-dependent and could be deduced from--fastq_stats
:At the end of the log file, for my particular dataset,
--fastq_truncqual 5
would yield a length of 148 nucleotides for more than 95% of the reads.