Closed plijnzaad closed 4 years ago
Thanks for this reproducible report. That is not the intended behavior.
Ah, I see the issue - you are using -a, which is for matching adapters at the end of the read. Try using -g instead.
There is also the concept of non-internal adapters (https://cutadapt.readthedocs.io/en/stable/guide.html#non-internal-5-and-3-adapters). I will open an issue to port this behavior over from Cutadapt.
@plijnzaad please reopen if this doesn't solve your issue.
I have been playing around with N{number} prefixes before an adaptor in order to get rid of the adapter and, say, (up to) 14 nucleotides before it. However, it turns out that e.g. using
-a 'N{6}GATCGTCGGACTGTAGAACTCTGAAC'
(and also the equivalentNNNNNNGATCGTCGGACTGTAGAACTCTGAAC
) leads to shortening all reads that do not contain any adapter whatsoever by the length of the N-prefix ?! I.e. the sequence^ATATGCGC$
gets shortened to^ATAT$
using adapter 'NNNNTGCA
', more or less as if the adapter is 'slid backwards' over read until it mismatches, and then the cut is made. The incantation (atropos version 2.0.0a5.post20200601, python 3.6.1) I used wasThe input and result files are attached (all renamed to *txt because github won't allow me otherwise)
Is this the way it is supposed to work or am I doing something wrong? I think this used not to be the behaviour.
(Incidentally, it would be really useful if the N{k} syntax would be Perl-regexp-like, so that you can supply a range of lengths for the wild-card region.)