usadellab / Trimmomatic

Other
203 stars 70 forks source link

Quality score autodetection fails for Element Biosciences AVITI data #42

Open michellescribner opened 1 year ago

michellescribner commented 1 year ago

Trimmomatic fails for sequencing data generated by the Element Biosciences AVITI instrument if Phred33/64 encoding is not provided (quality score is autodetected for Illumina data).

"Error: Unable to detect quality encoding"

Adding the "-phred33" argument solves the issue.

TonyBolger commented 1 year ago

Unfortunately it's not always possible to tell the quality offset of a FASTQ file, since the symbols used overlap. Trimmomatic relies on finding some quality symbols (within the early part of the file) which are informative - a low quality base in PHRED-33 (which would imply a negative quality score in PHRED-64) or a high quality base in Phred-64 (which would give a quality score >40 in PHRED-33).

A FASTQ file encoded PHRED-33 with quality scores from 31-40 looks the same as a PHRED-64 encoding with quality scores 0-9. I guess your file falls into this category (at least in the first chunk)