marcelm / cutadapt

Cutadapt removes adapter sequences from sequencing reads
https://cutadapt.readthedocs.io
MIT License
502 stars 125 forks source link

Remove instead of cut adapter #716

Open EivindStensrud opened 12 months ago

EivindStensrud commented 12 months ago

Hi I am wondering if Cutadapt would make an option to only remove the primer adapter sequence, without cutting the sequence. As I am would like to keep the sequence information which lays in front of the adapter I want to remove.

Ex: Either unmerged reads INTERESTINGadapterAMPLICON -> INTERESTINGAMPLICON

or merged reads INTERESTINGadapterAMPLICONreverseadaptor -> INTERESTINGAMPLICON

I am aware of an AWK script could be used, but I think it could be a nice addition to the package.

Regards Eivind

marcelm commented 12 months ago

Hi, thanks for the suggestion. Can you explain a bit why you think this would be useful? I haven’t encountered a situation where that would make so much sense, mostly because I think the resulting sequence would no longer be based in reality, but it would be something artificial. So far, all sequence modifications in Cutadapt remove only a prefix and/or a suffix of the input sequence. Then one you can argue that this only changes one’s "view" of the sequence, but doesn’t actually change it.

The principle that the output is a substring of the input (that is, it is fully described by a start and end coordinate within the original sequence) is engrained quite deeply in Cutadapt, and changing this would require some effort.

EivindStensrud commented 12 months ago

Without going too much into details, I am working with unique molecular identifiers (UMIs), and this addition could potentially streamline UMI based error correction models, to correct for PCR- and sequencing induced errors on independently on every DNA template molecule. With this approach, we can circumvent moving the UMI to the header, remove the primer adaptors, and lastly move the UMI back onto the sequence.

Regards