Closed chollenbeck closed 3 years ago
Hey Chris,
Thanks for letting me know about this. I will try to test on my end as well.
Was this ever resolved? In theory, a fasta of the adaptors can be provided and fastp
hardcoded to read that fasta like you have it. There's a great file of all kinds of them here. The docs also mention doing this for the TruSeq adapters:
The most widely used adapter is the Illumina TruSeq adapters. If your data is from the TruSeq library, you can add
--adapter_sequence=AGATCGGAAGAGCACACGTCTGAACTCCAGTCA --adapter_sequence_r2=AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
to your command lines
I think versions beyond 0.20.0 now autodetect a whole list of Illumina adapters. This should be fixed with the new version requirement of fastp
Hi Jon,
I think that trimming with
fastp
would benefit from adding the actual adapters using the --adapter_fasta flag. Below I've pasted the results of a small test showing the presence of adapters from readthrough of the P2 in the forward reads of one sample.P_007.F.fq.gz
is untrimmed,P_007.nofa.fq.gz
is trimmed with only the adaptive trimming, andP_007.trimfa.fq.gz
uses the --adapter_fasta flag and the TruSeq adapter FASTA. I'm grepping for the P2 here:I think what might be happening is the auto adapter detection in
fastp
is mistaking some other repetitive sequences in the data for the adapters. It prints the sequence it detects, but it's not the true adapter:I can't see that there is a way to turn off the auto-detection, but it seems that adding the TrueSeq adapters specifically lets it search for those in addition to the ones it detects.