jpuritz / dDocent

a bash pipeline for RAD sequencing
ddocent.com
MIT License
53 stars 41 forks source link

Adaptive trimming of adapters with fastp #52

Closed chollenbeck closed 3 years ago

chollenbeck commented 4 years ago

Hi Jon,

I think that trimming with fastp would benefit from adding the actual adapters using the --adapter_fasta flag. Below I've pasted the results of a small test showing the presence of adapters from readthrough of the P2 in the forward reads of one sample. P_007.F.fq.gz is untrimmed, P_007.nofa.fq.gz is trimmed with only the adaptive trimming, and P_007.trimfa.fq.gz uses the --adapter_fasta flag and the TruSeq adapter FASTA. I'm grepping for the P2 here:

$ zgrep -c AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC P_007.F.fq.gz 
2144
$ zgrep -c AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC P_007.nofa.fq.gz 
2144
$ zgrep -c AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC P_007.trimfa.fq.gz 
0

I think what might be happening is the auto adapter detection in fastp is mistaking some other repetitive sequences in the data for the adapters. It prints the sequence it detects, but it's not the true adapter:

Detecting adapter sequence for read1...
CATGCAATATTCTAACTGATAAATCATGACATGACATCCAAGATTTCAAAATGTCATGCC

I can't see that there is a way to turn off the auto-detection, but it seems that adding the TrueSeq adapters specifically lets it search for those in addition to the ones it detects.

jpuritz commented 4 years ago

Hey Chris,

Thanks for letting me know about this. I will try to test on my end as well.

pdimens commented 3 years ago

Was this ever resolved? In theory, a fasta of the adaptors can be provided and fastp hardcoded to read that fasta like you have it. There's a great file of all kinds of them here. The docs also mention doing this for the TruSeq adapters:

The most widely used adapter is the Illumina TruSeq adapters. If your data is from the TruSeq library, you can add --adapter_sequence=AGATCGGAAGAGCACACGTCTGAACTCCAGTCA --adapter_sequence_r2=AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT to your command lines

jpuritz commented 3 years ago

I think versions beyond 0.20.0 now autodetect a whole list of Illumina adapters. This should be fixed with the new version requirement of fastp