usadellab / Trimmomatic

Other
214 stars 70 forks source link

Input reads is obviously less than my data #8

Closed lubocoix closed 3 years ago

lubocoix commented 3 years ago

Trimmomatic java -jar trimmomatic-0.39.jar PE -threads 16 -phred33 /Users/lubo/RBH/RBH_9_1.filtlowGC.R1.fq.gz /Users/lubo/RBH/RBH_9_1.filtlowGC.R2.fq.gz /Users/lubo/RBH/RBH_9_1.filtlowGC.R1_paired.fq.gz /Users/lubo/RBH/RBH_9_1.filtlowGC.R1_unpaired.fq.gz /Users/lubo/RBH/RBH_9_1.filtlowGC.R2_paired.fq.gz /Users/lubo/RBH/RBH_9_1.filtlowGC.R2_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:5 TRAILING:5 SLIDINGWINDOW:4:15 MINLEN:36

TrimmomaticPE: Started with arguments: -threads 16 -phred33 /Users/lubo/RBH/RBH_9_1.filtlowGC.R1.fq.gz /Users/lubo/RBH/RBH_9_1.filtlowGC.R2.fq.gz /Users/lubo/RBH/RBH_9_1.filtlowGC.R1_paired.fq.gz /Users/lubo/RBH/RBH_9_1.filtlowGC.R1_unpaired.fq.gz /Users/lubo/RBH/RBH_9_1.filtlowGC.R2_paired.fq.gz /Users/lubo/RBH/RBH_9_1.filtlowGC.R2_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:5 TRAILING:5 SLIDINGWINDOW:4:15 MINLEN:36 Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT' ILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences Exception in thread "Thread-0" java.lang.RuntimeException: Sequence and quality length don't match: 'ACCATCATAGCAGCAGATCGCACACTGATGACC1101:211TGATGTACTCATAGAAT' vs 'FFFFFFFFFFFFFF' at org.usadellab.trimmomatic.fastq.FastqRecord.(FastqRecord.java:25) at org.usadellab.trimmomatic.fastq.FastqParser.parseOne(FastqParser.java:89) at org.usadellab.trimmomatic.fastq.FastqParser.next(FastqParser.java:179) at org.usadellab.trimmomatic.threading.ParserWorker.run(ParserWorker.java:42) at java.base/java.lang.Thread.run(Thread.java:832) Exception in thread "Thread-1" java.lang.RuntimeException: Sequence and quality length don't match: 'CCTCAGGCTTTGGCGGCTCAGGCTCCTCCTTCTCCTCTTCCTTCTTCTCCTCCGGCGGAGGCGGTATCGGCGACAAGAGCTCCACCTTGCGGCCGGTCTTCTTCTGGACGCGCTCCACCACCTA' vs 'FCGGGAGGCGCTTCTCGGCCTTGGGCG2FFFFFFFFFF:10936:1485T,FFF,FFFFFFFFFFFFFFFFFFFFFFCTTCTGATTTCAAATTTTGCATTGGTCG:AGTCATGGAC9CACATAAGCAGTGGCAC' at org.usadellab.trimmomatic.fastq.FastqRecord.(FastqRecord.java:25) at org.usadellab.trimmomatic.fastq.FastqParser.parseOne(FastqParser.java:89) at org.usadellab.trimmomatic.fastq.FastqParser.next(FastqParser.java:179) at org.usadellab.trimmomatic.threading.ParserWorker.run(ParserWorker.java:42) at java.base/java.lang.Thread.run(Thread.java:832) Input Read Pairs: 5000 Both Surviving: 4996 (99.92%) Forward Only Surviving: 4 (0.08%) Reverse Only Surviving: 0 (0.00%) Dropped: 0 (0.00%) TrimmomaticPE: Completed successfully

My RNA sequence is about 1.3Gb, the input reads is only 5000. I want to know why did this happen? I run it on my MAC terminal, Or it's just cased by my MAC RAM is so little.

TonyBolger commented 3 years ago

It appears that your input files are corrupted - these are detected as an invalid records:

Exception in thread "Thread-0" java.lang.RuntimeException: Sequence and quality length don't match: 'ACCATCATAGCAGCAGATCGCACACTGATGACC1101:211TGATGTACTCATAGAAT' vs 'FFFFFFFFFFFFFF'

and

Exception in thread "Thread-1" java.lang.RuntimeException: Sequence and quality length don't match: 'CCTCAGGCTTTGGCGGCTCAGGCTCCTCCTTCTCCTCTTCCTTCTTCTCCTCCGGCGGAGGCGGTATCGGCGACAAGAGCTCCACCTTGCGGCCGGTCTTCTTCTGGACGCGCTCCACCACCTA' vs 'FCGGGAGGCGCTTCTCGGCCTTGGGCG2FFFFFFFFFF:10936:1485T,FFF,FFFFFFFFFFFFFFFFFFFFFFCTTCTGATTTCAAATTTTGCATTGGTCG:AGTCATGGAC9CACATAAGCAGTGGCAC'

Both of these seem to have part of the record name (1101:211 and :10936:1485T ) mixed into the quality score line somehow - i'm guessing they would also fail the gzip checksum test (gunzip -t)

Unfortunately in these circumstances, this version of trimmomatic doesn't properly indicate report an error or clearly indicate that the output is probably useless. This problem should be fixed in the HEAD version.

lubocoix commented 3 years ago

Thanks a lot! This question bother me long time, the data is provided by my PI, how can I fix this problem ,can you provide me some advice or which tools should I use. Thanks again

2021年6月21日 下午6:04,TonyBolger @.***> 写道:

It appears that your input files are corrupted - these are detected as an invalid records:

Exception in thread "Thread-0" java.lang.RuntimeException: Sequence and quality length don't match: 'ACCATCATAGCAGCAGATCGCACACTGATGACC1101:211TGATGTACTCATAGAAT' vs 'FFFFFFFFFFFFFF'

and

Exception in thread "Thread-1" java.lang.RuntimeException: Sequence and quality length don't match: 'CCTCAGGCTTTGGCGGCTCAGGCTCCTCCTTCTCCTCTTCCTTCTTCTCCTCCGGCGGAGGCGGTATCGGCGACAAGAGCTCCACCTTGCGGCCGGTCTTCTTCTGGACGCGCTCCACCACCTA' vs 'FCGGGAGGCGCTTCTCGGCCTTGGGCG2FFFFFFFFFF:10936:1485T,FFF,FFFFFFFFFFFFFFFFFFFFFFCTTCTGATTTCAAATTTTGCATTGGTCG:AGTCATGGAC9CACATAAGCAGTGGCAC'

Both of these seem to have part of the record name (1101:211 and :10936:1485T ) mixed into the quality score line somehow - i'm guessing they would also fail the gzip checksum test (gunzip -t)

Unfortunately in these circumstances, this version of trimmomatic doesn't properly indicate report an error or clearly indicate that the output is probably useless. This problem should be fixed in the HEAD version.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/usadellab/Trimmomatic/issues/8#issuecomment-864906124, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATGHJIZBHGMV32RIRWP6GKLTT4FBVANCNFSM47BF7ASA.

TonyBolger commented 3 years ago

If at all possible, you need to get an undamaged copy of the data. First step i would suggest confirming if the issue is detected by the checksum test (gunzip -t) on the same machine as this problem is seen - i'm 95% sure it will be, as this looks like corruption of the binary compressed data, not the original FASTQ. Then work 'upstream' to see if the problem exists there (e.g. on your P.I.'s copy of the data). Off the top of my head, the most likely causes of this issue are FTP transfers of the compressed files in ASCII mode or broken hardware (likely RAM).

lubocoix commented 3 years ago

Get it,Thanks again.

2021年6月22日 下午12:14,TonyBolger @.***> 写道:

If at all possible, you need to get an undamaged copy of the data. First step i would suggest confirming if the issue is detected by the checksum test (gunzip -t) on the same machine as this problem is seen - i'm 95% sure it will be, as this looks like corruption of the binary compressed data, not the original FASTQ. Then work 'upstream' to see if the problem exists there (e.g. on your P.I.'s copy of the data). Off the top of my head, the most likely causes of this issue are FTP transfers of the compressed files in ASCII mode or broken hardware (likely RAM).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/usadellab/Trimmomatic/issues/8#issuecomment-865514205, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATGHJI6PZLDRARCOGVPZ2TTTUAE3BANCNFSM47BF7ASA.