timflutre / trimmomatic

Read trimming tool for Illumina NGS data.
http://www.usadellab.org/cms/index.php?page=trimmomatic
128 stars 90 forks source link

Trimmomatic drops reads instead of clipping adapters with ILLUMINACLIP #2

Closed MatthewRalston closed 8 years ago

MatthewRalston commented 8 years ago

I developed a trivial test dataset to test the clipping capacity of trimmomatic with perfectly sequenced adapters (no mismatches). My read files are as follows:

fastq 1 (forward read)

5'ADAPTERfoo
bar
5'ADAPTERbaz

fastq 2 (reverse read)

foo
3'REVCOMPLEMENTADAPTERbar
3'REVCOMPLEMENTADAPTERbaz

The behavior I am observing seems to be associated with the Simple clipping mode on paired-end data. Palindromic mode is not engaged in this circumstance. I was troubled to find that when contamination is in one read only, the whole read is dropped. This behavior is documented in the manual.

However, this is not desired behavior in nearly all circumstances. For example, consider the behavior of cutadapt, which clips the adapter-matching sequences when they are identified. In a very typical 2x100bp run, Trimmomatic will completely drop both reads if contamination is found at both the 5` and 3' end. It will also drop a read if the whole adapter sequence is found in it. This is clearly not "clipping" the adapters from the reads.

>java -jar ~/Projects/external_packages/Trimmomatic-0.32/trimmomatic-0.32.jar PE -threads 80 -phred33 -trimlog trimmoatic.log test1.fastq test2.fastq example1.fastq example1.unpaired.fastq example2.fastq example2.unpaired.fastq ILLUMINACLIP:adapters.fa:1:20:20:12:true MINLEN:20
TrimmomaticPE: Started with arguments: -threads 80 -phred33 -trimlog trimmoatic.log test1.fastq test2.fastq example1.fastq example1.unpaired.fastq example2.fastq example2.unpaired.fastq ILLUMINACLIP:adapters.fa:1:20:20:12:true MINLEN:20
Using PrefixPair: 'TTACTATTTTTAAACCTAGAACGCAGGATATAAC' and 'AGATAAAAATACCTCGCGCGGTTGACCCCGTAGG'
Using Long Clipping Sequence: 'CCTACGGGGTCAACCGCGCGAGGTATTTTTATCT'
Using Long Clipping Sequence: 'TTACTATTTTTAAACCTAGAACGCAGGATATAAC'
Using Long Clipping Sequence: 'AGATAAAAATACCTCGCGCGGTTGACCCCGTAGG'
Using Long Clipping Sequence: 'GTTATATCCTGCGTTCTAGGTTTAAAAATAGTAA'
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Read Pairs: 2 Both Surviving: 0 (0.00%) Forward Only Surviving: 0 (0.00%) Reverse Only Surviving: 1 (50.00%) Dropped: 1 (50.00%)
TrimmomaticPE: Completed successfully

adapters.fa

>PrefixPE/1
TTACTATTTTTAAACCTAGAACGCAGGATATAAC
>PrefixPE/2
AGATAAAAATACCTCGCGCGGTTGACCCCGTAGG
>PE1
TTACTATTTTTAAACCTAGAACGCAGGATATAAC
>PE1_rc
GTTATATCCTGCGTTCTAGGTTTAAAAATAGTAA
>PE2
AGATAAAAATACCTCGCGCGGTTGACCCCGTAGG
>PE2_rc
CCTACGGGGTCAACCGCGCGAGGTATTTTTATCT

test1.fastq

@A2RC-4
CCTACGGGGTCAACCGCGCGAGGTATTTTTATCTCAATTGGTTTTTTTCCTCCGGTATGGAAGCCCCCAATGGTTGCATACTACCGACTCGTCCTTATGA
+
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
@A2RC-5
CCTACGGGGTCAACCGCGCGAGGTATTTTTATCTCAATTGGTTTTTTTCCTCCGGTATGGAAGCCCCCAATGGTTGCATACTACCGACTCGTCCTTATGA
+
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

test2.fastq

@A2RC-4
ACATTTTAGCGCCGGGTCGGGTGTGATAGAGTTTATGTCACCAATGCGTTTGGCTCTTGGAGATAGTCTTCATAAGGATTTGTAAGTAGACCAGATCACA
+
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
@A2RC-5
TTACTATTTTTAAACCTAGAACGCAGGATATAACCACTTGTATGGACTGAACAGATCGAAATGCACTCCCGGCGGATTATCTGGAAGTCTGCGGAGAGAC
+
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
timflutre commented 8 years ago

As explained in the README, I am not the official maintainer of Trimmomatic. For this issue, you should contact Anthony Bolger (http://www.usadellab.org/cms/index.php?page=trimmomatic). Note also that I made this repo with version 0.33 of the code, but I can see that it is now at version 0.36.