zaeleus / noodles

Bioinformatics I/O libraries in Rust
MIT License
477 stars 53 forks source link

QualityScore conversion from fastq #277

Closed d-cameron closed 2 weeks ago

d-cameron commented 1 month ago

I am attempting to convert fastq records to sam and I can't seem to find the API to perform the quality score phred offset conversion. The sam raw QualityScores does what I want but it's not public I can't actually instantiate it. Is there an API to get a QualityScore from a fastq .quality_scores() u8? Even a constructor method that I had to explicitly set the the fastq phred offset (because of course it's bioinformatics so it's not universally a +33 offset) would be useful.

d-cameron commented 1 month ago

Also, a sentence on the QualityScores trait explicitly stating the phred offset is 0 would be a useful clarification.

zaeleus commented 1 month ago

Yes, given that there is no standard quality score offset for in FASTQ, noodles provides the raw quality scores. I suspect that a wrapper would not provide much of an improvement, e.g., quality_scores.iter_with_offset(33) vs quality_scores.iter().map(|&n| n - 33).

The sam raw QualityScores does what I want but it's not public I can't actually instantiate it.

I increased the visibility of the constructor for sam::record::QualityScores in noodles 0.78.0/noodles-sam 0.62.0. You can now wrap FASTQ quality scores as SAM quality scores, e.g.,

use noodles::{
    fastq::{self, record::Definition},
    sam::{alignment::record::QualityScores as _, record::QualityScores},
};

let record = fastq::Record::new(Definition::new("r0", ""), "ACGT", "NDLS");
let quality_scores = QualityScores::new(record.quality_scores());
assert_eq!(quality_scores.iter().collect::<Vec<_>>(), [45, 35, 43, 50]);

Of course, this assumes the FASTQ quality scores are following the Sanger convention.

Also, a sentence on the QualityScores trait explicitly stating the phred offset is 0 would be a useful clarification.

I added a note, thanks!