jdidion / atropos

An NGS read trimming tool that is specific, sensitive, and speedy. (production)
Other
120 stars 15 forks source link

Can atropos remove 5' adapter variants that are incomplete from the tail? #128

Open lokapal opened 3 years ago

lokapal commented 3 years ago
Hello!

Just to clarify things - as far as I understand, atropos cannot remove incomplete 5' adapters that are incomplete at the tail, not head (so as cutadapt)? I.e. if I have MYVERYLONGADAPTER and I have a lot of reads like

read1 MYVERYLONGAmysequence1 read2 MYVERYLOmysequence2

then the leftovers from the adapter can be removed only by listing all possible variants in the adapter.fa file? I just installed atropos version 1.1.31 system-wide through "python3.7 -m pip install atropos"

jdidion commented 3 years ago

Do you have an example of a library prep that would produce reads with these characteristics?

lokapal commented 3 years ago

Surely I do, it's not a theoretical question. Please find attached the example: two entries that are marked up. It's 4C library. reads.fa.gz Three adapters: A1, A2, Illumina/IlluminaPE. A1D, A2D - direct adapters, A1RC, A2RC - reverse complement adapters.

jdidion commented 3 years ago

Cool, thanks. I’ll look into it. If you have a reference for the protocol that would be helpful.

On Aug 7, 2021, at 10:59 AM, lokapal @.***> wrote:

 Surely I do, it's not a theoretical question. Please find attached the example: two entries that are marked up. It's 4C library. reads.fa.gz

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

jdidion commented 3 years ago

Found this - I assume this is standard protocol? https://www.sciencedirect.com/science/article/pii/S1046202318304742

Here they are trimming all reads at the same position (i.e. the -u flag of atropos). Is what they suggest standard, or is variable-length trimming like you're trying to do more the norm?

lokapal commented 3 years ago

I can't state for "all" researchers but "my" wet biologists constantly supply me with libraries that can contain two 4C adapters and can contain only one 4C adapter, can contain two full adapters and can contain one full and one incomplete adapter in the different reads. Previously it always were SE reads (in my case) and it was much simpler - I always have cut the full 5' ANY adapter and didn't care about what was BEFORE it. But now I have PE reads and it is much more complicated, as you can see from the example attached.