Closed puli1027 closed 2 months ago
The data source is https://ftp.ncbi.nih.gov/snp/latest_release/VCF/
Thanks for the report and example!
This is likely an invalid INFO field value. While the range is undefined in VCF 4.2, VCF 4.3 clarifies that integers are 32-bit signed integers (§ 1.3 "Data types" (2022-11-27)):
Data types supported by VCF are: Integer (32-bit, signed)...
See also § 7.2 "Changes between VCFv4.2 and VCFv4.3" (2022-11-27):
In order for VCF and BCF to have the same expressive power, we state explicitly that Integers and Floats are 32-bit numbers. Integers are signed.
By default, htslib can't read this value and silently discards the data:
$ bcftools --version
bcftools 1.21
Using htslib 1.21
$ bcf view --no-header 302.vcf
[W::vcf_parse_info] Extreme INFO/RS value encountered and set to missing at sq0:1
sq0 1 . A . . . RS=.
In noodles, I recommend redefining the RS
type as a string. If it needs to be used, parse it manually as a larger integer type, e.g.,
use noodles_vcf::{
self as vcf, header::record::value::map::info::Type, variant::record_buf::info::field::Value,
};
const DATA: &[u8] = br#"##fileformat=VCFv4.2
##INFO=<ID=RS,Number=1,Type=Integer,Description="dbSNP ID (i.e. rs number)">
#CHROM POS ID REF ALT QUAL FILTER INFO
sq0 1 . A . . . RS=2148352434
"#;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut reader = vcf::io::Reader::new(DATA);
let mut header = reader.read_header()?;
if let Some(rs) = header.infos_mut().get_mut("RS") {
*rs.type_mut() = Type::String;
}
for result in reader.record_bufs(&header) {
let record = result?;
let info = record.info();
if let Some(Some(Value::String(value))) = info.get("RS") {
dbg!(value.parse::<i64>())?;
}
}
Ok(())
}
@zaeleus I understand now. Thank you for your reply
error: number too large to fit in target type
Data: Info("RS=2148352434;dbSNPBuildID=156;SSR=0;GENEINFO=GPR153:387509;VC=SNV;INT;GNO;FREQ=1000Genomes:0.9998,0.0001562") I find that defind 'https://docs.rs/noodles/0.82.0/noodles/vcf/variant/record_buf/info/field/value/enum.Value.html'
i32 is to small than 2148352434;