najoshi / sickle

Windowed Adaptive Trimming for fastq files using quality
MIT License
219 stars 95 forks source link

phred33 quality scores #13

Closed fgvieira closed 11 years ago

fgvieira commented 11 years ago

Does sickle support phred33 scores? I thought it did, but when giving it a try on some illumina reads got the message:

ERROR: Quality value (42) does not fall within correct range for Illumina encoding. Range for Illumina encoding: 64-110 FastQ record: FC301W6AAXX:7:1:5:265#0/1 Quality string: _0055DGUWZY_NTYXVVVOZBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB5500_ Quality char: '*' Quality position: 1

najoshi commented 11 years ago

You need to choose "sanger" for the quality encoding.... it is a little confusing, but phred+33 is sanger. "illumina" encoding is phred+64... that refers to the encoding that illumina used to use before CASAVA 1.8.

On Thu, May 2, 2013 at 4:00 PM, Filipe G. Vieira notifications@github.comwrote:

Does sickle support phred33 scores? I thought it did, but when giving it a try on some illumina reads got the message:

ERROR: Quality value (42) does not fall within correct range for Illumina encoding. Range for Illumina encoding: 64-110 FastQ record: FC301W6AAXX:7:1:5:265#0/1 Quality string: _0055DGUWZY_NTYXVVVOZBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB5500_ Quality char: '*' Quality position: 1

— Reply to this email directly or view it on GitHubhttps://github.com/najoshi/sickle/issues/13 .

Nikhil Joshi Bioinformatics Analyst/Programmer UC Davis Bioinformatics Core http://bioinformatics.ucdavis.edu/ najoshi -at- ucdavis -dot- edu 530.752.2698 (w)

fgvieira commented 11 years ago

Actually I did try sanger, but got a similar error, only now the quality score is too high:

ERROR: Quality value (85) does not fall within correct range for Sanger encoding. Range for Sanger encoding: 33-80 FastQ record: FC301W6AAXX:7:1:5:265#0/1 Quality string: _0055DGUWZY_NTYXVVVOZBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB5500_ Quality char: 'U' Quality position: 11

najoshi commented 11 years ago

Hi Filipe,

Are you sure this is an Illumina file? Which platform is it from? That range of quality is strange... can you send me a snippet of the data, like maybe 10 records before and after the record where there is an error?

On Thu, May 2, 2013 at 4:09 PM, Filipe G. Vieira notifications@github.comwrote:

Actually I did try sanger, but got a similar error, only now the quality score is too high:

ERROR: Quality value (85) does not fall within correct range for Sanger encoding. Range for Sanger encoding: 33-80

FastQ record: FC301W6AAXX:7:1:5:265#0/1 Quality string: _0055DGUWZY_NTYXVVVOZBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB5500_ Quality char: 'U' Quality position: 11

— Reply to this email directly or view it on GitHubhttps://github.com/najoshi/sickle/issues/13#issuecomment-17370456 .

Nikhil Joshi Bioinformatics Analyst/Programmer UC Davis Bioinformatics Core http://bioinformatics.ucdavis.edu/ najoshi -at- ucdavis -dot- edu 530.752.2698 (w)

fgvieira commented 11 years ago

I sent you an email but not sure if you got it. Either way here are a couple of reads: @FC301W6AAXX:7:1:5:1396#0/1 TGAAGANNTCCACTCCTCCCATAGTCATGACCACCTNCTCCGCATTAAAACTCTTTATAAAAATGACCCAGAT + aVbaWYDDZa`aaaaaaBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB @FC301W6AAXX:7:1:5:1161#0/1 ATAATTNNAACTGACTTACAATACCGTCCTAGTGTTNACGCTATTTATTTCCTTAATGGAAGTGTACAAATTG + ]G[[DPDD^a\O^_bXBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

ysurget commented 11 years ago

I have the same problem. It seems that the quality range in sickle is not large enough. Phred quality ranges from 0 to 93 (sanger encoded from 33 to 126), while sickle limits it to 47 (sanger encoded by 80). It is easy to fix, just change the file sickle.h on line 85 to: {33, 33, 126}, /* SANGER */ and recompile.

najoshi commented 11 years ago

fixed