marcelm / cutadapt

Cutadapt removes adapter sequences from sequencing reads
https://cutadapt.readthedocs.io
MIT License
511 stars 129 forks source link

Enhancement request: trim but keep adaptor, --action=trimupto #443

Closed peterjc closed 3 years ago

peterjc commented 4 years ago

I have a usecase in mind were rather than adapter or primer sequence which I want to match and remove, I have markers for a region of interest, and I want the (possibly inexactly matched) marker to be retained in the output.

Currently there are four action modes (correct as of cutadapt v2.8):

$ cutadapt -h | grep "\-\-action" -A 4
  --action {trim,mask,lowercase,none}
                        What to do with found adapters. mask: replace with 'N'
                        characters; lowercase: convert to lowercase; none:
                        leave unchanged (useful with --discard-untrimmed).
                        Default: trim

Would you consider a new action mode, suggested name trimupto (trim up to) or trimuntil, as follows:

  --action {trim,mask,lowercase,trimupto,none}
                        What to do with found adapters. mask: replace with 'N'
                        characters; lowercase: convert to lowercase; trimupto:
                        trim up to but retaining the adapter; none: leave
                        unchanged (useful with --discard-untrimmed).
                        Default: trim

Left adapter example:

$ for ACTION in trim mask lowercase none; do echo; echo "Using --action $ACTION:"; cutadapt -g AAAAAA -o - example.fasta --quiet --discard-untrimmed --action $ACTION; done

Using --action trim:
>example
XXXXXXXXXXXGGGGGGRRRRRRR

Using --action mask:
>example
NNNNNNNNNNNNNNXXXXXXXXXXXGGGGGGRRRRRRR

Using --action lowercase:
>example
llllllllaaaaaaXXXXXXXXXXXGGGGGGRRRRRRR

Using --action none:
>example
LLLLLLLLAAAAAAXXXXXXXXXXXGGGGGGRRRRRRR

Proposed output:

Using --action trimupto:
>example
AAAAAAXXXXXXXXXXXGGGGGGRRRRRRR

Using left and right adapter:

$ for ACTION in trim mask lowercase none; do echo; echo "Using --action $ACTION:"; cutadapt -g AAAAAA...GGGGGG -o - example.fasta --quiet --discard-untrimmed --action $ACTION; done

Using --action trim:
>example
XXXXXXXXXXX

Using --action mask:
>example
NNNNNNNNNNNNNNXXXXXXXXXXXNNNNNNNNNNNNN

Using --action lowercase:
>example
llllllllaaaaaaXXXXXXXXXXXggggggrrrrrrr

Using --action none:
>example
LLLLLLLLAAAAAAXXXXXXXXXXXGGGGGGRRRRRRR

Proposed output:

Using --action trimupto:
>example
AAAAAAXXXXXXXXXXXGGGGGG

Right adapter example:

$ for ACTION in trim mask lowercase none; do echo; echo "Using --action $ACTION:"; cutadapt -a GGGGGG -o - example.fasta --quiet --discard-untrimmed --action $ACTION; done

Using --action trim:
>example
LLLLLLLLAAAAAAXXXXXXXXXXX

Using --action mask:
>example
LLLLLLLLAAAAAAXXXXXXXXXXXNNNNNNNNNNNNN

Using --action lowercase:
>example
LLLLLLLLAAAAAAXXXXXXXXXXXggggggrrrrrrr

Using --action none:
>example
LLLLLLLLAAAAAAXXXXXXXXXXXGGGGGGRRRRRRR

Proposed output:

Using --action trimupto:
>example
LLLLLLLLAAAAAAXXXXXXXXXXXGGGGGG
marcelm commented 4 years ago

Hi, I’m not working at the moment, so let me get back to you in a while, but one comment already now: A colleague has mentioned a request for this behavior to me a while ago, so I’ve had this in the back of my head. I had the idea of some extra notation within the adapter specification string, though, that would tell where to cut. But perhaps it’s easier to implement as an additional action.

peterjc commented 4 years ago

I'm encouraged that someone else also asked for this kind of behaviour.

Extra notation in the adapter specification string could work, and would be even more flexible than my current use case requires.

mariloubodde commented 3 years ago

Hi, I would also be interested in an option to discard sequence outside the adapters, but retain the adapters themselves. I was wondering if you are planning to implement this?

I'm working on a project comparing targeted amplicon data with "in silico" amplified data; for the latter I reconstruct the regions corresponding to the amplicon targets from shotgun sequencing reads. In my current pipeline I have some trouble with reads that overlap (either of) the primers by only a few bases and this would be resolved by retaining the primer sequences.

peterjc commented 3 years ago

Excellent, and using --action=retain as the name makes sense to me too. Shorter than my suggestions too 👍

Thank you!

marcelm commented 3 years ago

I wanted to comment here, but the auto-close happened before I got around to it ...

Yes, this is now implemented as --action=retain. I hope the behavior is as you both requested. I had suggested earlier that a special marker in the adapter specification would be a good way to do this, but I realized that implementing this as a different action actually is a lot easier, and not adding extra notation makes it easier for the users.

Documentation is at https://cutadapt.readthedocs.io/en/latest/guide.html#action .

@peterjc The retain actually comes from you because you wrote

I want the [...] marker to be retained in the output

(emphasis mine). It’s a word I rarely use otherwise, so that makes it easy to search for in the documentation.

I’ll release Cutadapt 3.1 with this feature included soon.

peterjc commented 3 years ago

Lovely - I'm on leave right now, but hopefully I'll get to try this out next month. Reading the documentation you've added, it should do what I was hoping for.

mariloubodde commented 3 years ago

Thanks a lot! It looks like this should do exactly what I was hoping for; I will try it out tomorrow!

On 2020-12-03 13:45, Marcel Martin wrote:

I wanted to comment here, but the auto-close happened before I got around to it ...

Yes, this is now implemented as --action=retain. I hope the behavior is as you both requested. I had suggested earlier that a special marker in the adapter specification would be a good way to do this, but I realized that implementing this as a different action actually is a lot easier, and not adding extra notation makes it easier for the users.

Documentation is at https://cutadapt.readthedocs.io/en/latest/guide.html#action .

@peterjc [1] The retain actually comes from you because you wrote

I want the [...] marker to be retained in the output (emphasis mine). It’s a word I rarely use otherwise, so that makes it easy to search for in the documentation. I’ll release Cutadapt 3.1 with this feature included soon.

-- You are receiving this because you commented. Reply to this email directly, view it on GitHub [2], or unsubscribe [3].

Links:

[1] https://github.com/peterjc [2] https://github.com/marcelm/cutadapt/issues/443#issuecomment-738003167 [3] https://github.com/notifications/unsubscribe-auth/ANC4QXDSKDOB3XZBFQP3VT3SS6JBFANCNFSM4K5PIVBQ