usadellab / Trimmomatic

Other
214 stars 70 forks source link

phred33/64 quality score autodetection #1

Closed bounlu closed 3 years ago

bounlu commented 3 years ago

Has the autodetection for phred quality score been implemented already?

tfmorris commented 3 years ago

See https://github.com/usadellab/Trimmomatic/blob/d89f8b7acfa8279aa1230e40af0093ac55c931d5/versionHistory.txt#L34

TonyBolger commented 3 years ago

Thanks tfmorris.

Most datasets should 'just work' but it will complain if it can't guess which encoding was used. This relies of finding bases within the first 10k reads with Q-scores which are unique to that encoding: 58 and below can only be phred-33 (unless the Q-score is below -6, AFAIK -5 was the lowest ever), while 80 and above can only be phred-64 (or a Q-score >=47, which so far hasn't been seen in Illumina data). If it fails (or gets it wrong for whatever reason), you'll need to specify it manually with --phred33 or --phred64.