usadellab / Trimmomatic

Other
208 stars 70 forks source link

Making a custom adapter file - confused about the sequences to include #20

Open lucygarner opened 2 years ago

lucygarner commented 2 years ago

Hi,

My issue is similar to that in #14, but I am still a bit confused about this.

According to NEB, the sequences that I need to trim off are: Adaptor Read1 AGATCGGAAGAGCACACGTCTGAACTCCAGTCA Adaptor Read2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

https://international.neb.com/faqs/2021/01/15/what-sequences-need-to-be-trimmed-for-nebnext-libraries-that-are-sequenced-on-an-illumina-instrument

However, based on the discussion in #14, it looks like these would not correspond to PrefixPE/1 and PrefixPE/2 as I thought. From my understanding, the sequences provided by NEB are those that are likely to contaminate Read 1 and Read 2, respectively, due to read through. How and why do these sequences need to be modified for use with Trimmomatic? Please could you supply the correct sequences to use.

Also, I see that in the TruSeq3-PE-2.fa file, you supply some additional sequences e.g. PE1 and PE1_rc - why are these added?

PrefixPE/1 TACACTCTTTCCCTACACGACGCTCTTCCGATCT PrefixPE/2 GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT PE1 TACACTCTTTCCCTACACGACGCTCTTCCGATCT PE1_rc AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA PE2 GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT PE2_rc AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC

Many thanks, Lucy

lucygarner commented 2 years ago

Is this correct? Do I want to add any further sequences?

PrefixPE/1 ACACTCTTTCCCTACACGACGCTCTTCCGATCT PrefixPE/2 TGACTGGAGTTCAGACGTGTGCTCTTCCGATCT

TonyBolger commented 2 years ago

The sequences NEB provide are essentially the same, swapped and in the opposite orientation. Adapter read-through is detected in Trimmomatic not by directly finding these specific sequences in the reads, it's done by comparing a 'prefixed' version of the forward and reverse read against each other (with reverse complement applied). What you need to provide to Trimmomatic are these 'prefix sequences'.

I'm not sure why the NEB sequences are one base shorter - perhaps their library prep creates a different base than normal at that position. If so, it might be marginally beneficial to shorten the prefix, as you suggest, but i would not expect a dramatic difference.

The additional sequences in the PE-2 file are only needed there was some blunt ligation happening during library prep, which happens if the library kit is degraded, e.g. lacking the Y structure or the A overhang. Clean libraries prepped with modern kits rarely have this issue, but sometimes people need to work with old data, so they're still included.

lucygarner commented 2 years ago

Thank you for the thorough answer. Is there any harm in including the extra sequences in the adapter FASTA with new data?

lucygarner commented 2 years ago

My current adapter file is as follows:

PrefixPE/1 ACACTCTTTCCCTACACGACGCTCTTCCGATCT PrefixPE/2 TGACTGGAGTTCAGACGTGTGCTCTTCCGATCT PCR_Primer1 ACACTCTTTCCCTACACGACGCTCTTCCGATCT PCR_Primer1_rc AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT PCR_Primer2 TGACTGGAGTTCAGACGTGTGCTCTTCCGATCT PCR_Primer2_rc AGATCGGAAGAGCACACGTCTGAACTCCAGTCA

Does this seem reasonable? As a reminder, the sequences provided by NEB are as follows: Adaptor Read1 AGATCGGAAGAGCACACGTCTGAACTCCAGTCA Adaptor Read2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

Dahn-YoungDong commented 1 year ago

@lucygarner Hello, I have exactly the same issue and wonder if you could update how you solved it, or moved on with the adapter trim?

lucygarner commented 1 year ago

I used this for my adapter file in the end:

PrefixPE/1 ACACTCTTTCCCTACACGACGCTCTTCCGATCT PrefixPE/2 TGACTGGAGTTCAGACGTGTGCTCTTCCGATCT PCR_Primer1 ACACTCTTTCCCTACACGACGCTCTTCCGATCT PCR_Primer1_rc AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT PCR_Primer2 TGACTGGAGTTCAGACGTGTGCTCTTCCGATCT PCR_Primer2_rc AGATCGGAAGAGCACACGTCTGAACTCCAGTCA

I hope it's correct!

Dahn-YoungDong commented 1 year ago

@lucygarner Hi thanks. I was only able to remove the reverse read adapters, yet not in the forward read (maybe only 10% removal). Did you have similar issues?

lucygarner commented 1 year ago

I didn't actually have much adapter contamination, so it was hard to test thoroughly. I would be interested to get @TonyBolger's thoughts.

lucygarner commented 1 year ago

@lucygarner Hi thanks. I was only able to remove the reverse read adapters, yet not in the forward read (maybe only 10% removal). Did you have similar issues?

@Dahn-YoungDong, what adapter sequence did you have in the forward read. Is it the expected "AGATCGGAAGAGCACACGTCTGAACTCCAGTCA"?