marcelm / cutadapt

Cutadapt removes adapter sequences from sequencing reads
https://cutadapt.readthedocs.io
MIT License
513 stars 129 forks source link

Small ONT optimization: length filtering before applying trimming #743

Open rhpvorderman opened 10 months ago

rhpvorderman commented 10 months ago

For illumina reads that are always 151 bp it makes sense to trim the adapter first, apply quality trimming, and only after that do length and quality filtering.

On nanopore however, this means that with a length filter of say 500 bp, a certain amount of reads get the expensive adapter trimming applied first while they were never able to pass the length filter in the first place. A read of length 400 bp will never pass the length filter, even after adapter trimming. Therefore applying a very cheap length filter at the beginning of the pipeline can lead to better performance.

marcelm commented 10 months ago

Good idea, but I’ll need to think about how to do this the best way. Because the adapter trimming statistics would change, it is a backwards-incompatible change.