marcelm / cutadapt

Cutadapt removes adapter sequences from sequencing reads
https://cutadapt.readthedocs.io
MIT License
523 stars 130 forks source link

`--cut` **after** adapter trimming #639

Closed bounlu closed 2 years ago

bounlu commented 2 years ago

Is there a way to remove bases from 3' end AFTER adapter trimming?

Like --three_prime_clip_r1 parameter in trim_galore which is needed for Nextflex library.

marcelm commented 2 years ago

If the intention is to remove an additional couple of bases if an adapter was found, then you can just add the appropriate number of N wildcards characters to the adapter sequence. So if you would normally search for -a ACGT and want to remove the three bases preceding the match, use -a NNNACGT instead. For a larger number of N, you can use the curly brace notation and provide an explicit number: -a N{8}ACGT (N{8} is the same as writing N eight times).

bounlu commented 2 years ago

Great, that would work for me, thank you.

bounlu commented 2 years ago

Just wondering one thing. Are these two equivalent?

  1. cutadapt -a NNNACGT

  2. cutadapt -a ACGT | cutadapt -u -3 -

marcelm commented 2 years ago

It depends on how --overlap is set.

If --overlap=3 is used (the default), then these commands behave similarly. The NNNACGT sequence matches the 3' end of every read because the three Ns fulfill the minimum overlap criterion, so no matter how the read ends, at least three bases are removed. So then the two commands work similarly.

If you have a larger overlap, let’s say --overlap=6, then the first command will find a match in a read that ends in ...ACG and remove it and the three bases preceding it, but if there’s no match, the read would remain untrimmed.