mourisl / Lighter

Fast and memory-efficient sequencing error corrector
GNU General Public License v3.0
92 stars 17 forks source link

Unintentional trimming of sequences #20

Closed schmeing closed 8 years ago

schmeing commented 8 years ago

Hi,

I tried to correct the publicly available SRR001665 dataset with lighter using the following command: nice -10 lighter -r ../SRR001665_1.fastq.gz -r ../SRR001665_2.fastq.gz -k 13 4600000 0.04 -t 64 -od k13/ 2>&1 | tee k13/lighter.log

The correction runs through without problems, but the resulting fastq files have 25 respectively 42 unintentionally trimmed sequences in them like this one: @SRR001665.72513 071112_SLXA-EAS1_s_4:1:6:808:233 length=36 cor badprefix=7 ak GCGTGCCGAAGTTAGTGGGCCTGGAGAATC + IIIIIIIIIIIIIIIIII3?I/%.IIII_IIC4I' There are still all 36 quality scores, but the last in this case 6 bases of the sequence have been trimmed.

The output is: [2016-03-22 16:44:22] =============Start==================== [2016-03-22 16:44:24] Bad quality threshold is "&" [2016-03-22 16:45:33] Finish sampling kmers [2016-03-22 16:45:33] Bloom filter A's false positive rate: 0.001899 [2016-03-22 16:47:13] Finish storing trusted kmers [2016-03-22 16:52:13] Finish error correction Processed 20816448 reads: 18328409 are error-free Corrected 3617298 bases(1.453875 corrections for reads with errors) Trimmed 0 reads with average trimmed bases 0.000000 Discard 0 reads

mourisl commented 8 years ago

I'll download that data set and take a look at it.

Thanks for letting me know.

mourisl commented 8 years ago

I think I've fixed the bug. Can you pull the new version and give it a try?

Thanks.

schmeing commented 8 years ago

Works fine for me now. Awesome how fast you fixed it. Thanks