rust-bio / rust-htslib

This library provides HTSlib bindings and a high level Rust API for reading and writing BAM files.
MIT License
302 stars 79 forks source link

Passing a INFO tag as a FORMAT tag failed silently instead of triggering an error #350

Open essut opened 2 years ago

essut commented 2 years ago

Hi, I am trying to solve issue rust-bio/rust-bio-tools#52, where the user triggered a code path that should not be possible: a FORMAT tag having a flag tag type.

I was able to reproduce by this passing a INFO tag as a FORMAT tag rbt vcf-to-txt --fmt T < tests/test.vcf , which is surprising to me since I expected the rust-htslib would panic given the tag does not exist as a FORMAT tag.

I then wrote my own code and using test.vcf as the input:

use rust_htslib::bcf::{Read, Reader};

fn main() {
    let mut bcf = Reader::from_path("test.vcf").unwrap_or_else(|e| panic!("{}", e));

    for record in bcf.records() {
        let record = record.unwrap();
        let tag = "T".as_bytes();

        println!("{:?}", record.header().info_type(tag));
        println!("{:#?}", record.info(tag));

        println!("{:?}", record.header().format_type(tag));
        println!("{:#?}", record.format(tag));
    }

    println!("{:#?}", bcf.header().header_records());
}

If "T" is interpreted as a INFO tag, all is well:

Ok((Integer, AltAlleles))
Info {
    record: Record {
        inner: 0x000055ac1a09e340,
        header: HeaderView {
            inner: 0x000055ac1a09cc20,
        },
    },
    tag: [
        84,
    ],
    buffer: Buffer {
        inner: 0x0000000000000000,
        len: 0,
    },
}

However, I am surprised that interpreting "T" as a FORMAT tag does not generate any errors:

Ok((Flag, Fixed(0)))
Format {
    record: Record {
        inner: 0x000055ac1a09e340,
        header: HeaderView {
            inner: 0x000055ac1a09cc20,
        },
    },
    tag: [
        84,
    ],
    inner: 0x0000000000000000,
    buffer: Buffer {
        inner: 0x0000000000000000,
        len: 0,
    },
}

Although in the header information, "T" is identified as a INFO tag:

[
    Generic {
        key: "fileformat",
        value: "VCFv4.3",
    },
    Filter {
        key: "FILTER",
        values: {
            "ID": "PASS",
            "Description": "\"All filters passed\"",
            "IDX": "0",
        },
    },
    Contig {
        key: "contig",
        values: {
            "ID": "1",
            "IDX": "0",
        },
    },
    Format {
        key: "FORMAT",
        values: {
            "ID": "S",
            "Number": "1",
            "Type": "String",
            "Description": "\"Text\"",
            "IDX": "1",
        },
    },
    Format {
        key: "FORMAT",
        values: {
            "ID": "GT",
            "Number": "1",
            "Type": "String",
            "Description": "\"Genotype\"",
            "IDX": "2",
        },
    },
    Info {
        key: "INFO",
        values: {
            "ID": "T",
            "Number": "A",
            "Type": "Integer",
            "Description": "\"Text\"",
            "IDX": "3",
        },
    },
    Info {
        key: "INFO",
        values: {
            "ID": "SOMATIC",
            "Number": "0",
            "Type": "Flag",
            "Description": "\"Somatic variant\"",
            "IDX": "4",
        },
    },
]

I am not sure if this is an expected behaviour or not. If it is not, it would help fixing this here instead of relying on downstream tools to catch this error. I am also not sure if this had been discussed before, so apologies for duplicates.

Versions: rust-bio-tools 0.39.0 rust-htslib 0.38.2

Meizuamy commented 1 year ago

use std::str::from_utf8 trans BufferedBackend to &str

use rust_htslib::bcf::{Read, Reader};

fn main() {
    let mut bcf = Reader::from_path("test.vcf").unwrap_or_else(|e| panic!("{}", e));

    for record in bcf.records() {
        let record = record.unwrap();
        let tag = "T".as_bytes();

        println!("{:?}", record.header().info_type(tag));
        println!("{:#?}", std::str::from_utf8(record.info(tag).string().except("No T tag in info!").expect("This record T tag has no value!")));

        println!("{:?}", record.header().format_type(tag));
        println!("{:#?}", std::str::from_utf8(record.format(tag).string().expect("No T tag in format!")));
    }

    println!("{:#?}", bcf.header().header_records());
}