Closed J-Wall closed 1 year ago
as far as I know there is no formal spec for BAM [sic] files (...)
Yes, there is a quite official BED spec since 2022: https://github.com/samtools/hts-specs/blob/master/BEDv1.pdf and in section 1.5 it is specified that the score
field should be represented as an integer with values between 0 and 1000:
OK, thanks. Given that most software in existence was written before 2022, I'm not sure there is strong justification for strictly adhering to it, but it's not my library... This overall issue is related to #111
noodles-bed follows hts-specs' description, which, as above, defines the score values to be integers. I recommend opening an issue in samtools/hts-specs if the value type is incorrect or fails to cover common usages.
One workaround is to parse the data as BED4+ records instead of BED5+ and above. This will parse the first set of standard fields and put the custom fields in Record::<4>::optional_fields
, e.g.,
use noodles_bed as bed;
fn main() -> Result<(), bed::record::ParseError> {
const DATA: &str = "sq0\t8\t13\t.\t21.0";
assert!(DATA.parse::<bed::Record<5>>().is_err());
let record: bed::Record<4> = DATA.parse()?;
dbg!(record.reference_sequence_name()); // => "sq0"
dbg!(record.start_position()); // => 9
dbg!(record.end_position()); // => 13
dbg!(record.name()); // => None
dbg!(record.optional_fields().get(0)); // => Some("21.0")
Ok(())
}
Thanks heaps, that's very helpful
On Sat, 10 June 2023, 1:22 am Michael Macias, @.***> wrote:
noodles-bed follows hts-specs' description, which, as above, defines the score values to be integers. I recommend opening an issue in samtools/hts-specs https://github.com/samtools/hts-specs if the value type is incorrect or fails to cover common usages.
One workaround is to parse the data as BED4+ records instead of BED5+ and above. This will parse the first set of standard fields and put the custom fields in Record::<4>::optional_fields, e.g.,
use noodles_bed as bed; fn main() -> Result<(), bed::record::ParseError> { const DATA: &str = "sq0\t8\t13\t.\t21.0";
assert!(DATA.parse::<bed::Record<5>>().is_err()); let record: bed::Record<4> = DATA.parse()?; dbg!(record.reference_sequence_name()); // => "sq0" dbg!(record.start_position()); // => 9 dbg!(record.end_position()); // => 13 dbg!(record.name()); // => None dbg!(record.optional_fields().get(0)); // => Some("21.0") Ok(())}
— Reply to this email directly, view it on GitHub https://github.com/zaeleus/noodles/issues/175#issuecomment-1584759621, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADVE3QND4O2AIG2TLHPMLNDXKM5UFANCNFSM6AAAAAAZAE3GRM . You are receiving this because you authored the thread.Message ID: @.***>
I have BED files with non-integer values in the score field. noodles tries to parse these into
u16
and fails withError: Custom { kind: InvalidData, error: InvalidScore(Parse(ParseIntError { kind: InvalidDigit })) }
.I am wondering whether it would be better to use a floating point representation of the scores, as far as I know there is no formal spec for BAM files, and the closest thing to one doesn't explicitly say they should be integers.