samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
680 stars 240 forks source link

cmp_vector_strings() in filter.c complains that nvalues!=str_value.l for VCF lines missing the GT format #2314

Closed freeseek closed 6 days ago

freeseek commented 2 weeks ago

The following BCFtools command crashes:

(echo -e "##fileformat=VCFv4.2"
echo "##contig=<ID=chr1>"
echo "##FORMAT=<ID=GT,Number=1,Type=String,Description=\"Genotype\">"
echo -e "#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\tSM"
echo -e "chr1\t1\t.\tA\tC\t.\t.\t.\tGT\t0|1"
echo -e "chr1\t2\t.\tG\tT\t.\t.\t.\t.\t.") | bcftools view --include 'GT="0|1"'
bcftools: filter.c:2669: cmp_vector_strings: Assertion `atok->nvalues==atok->str_value.l && btok->nvalues==btok->str_value.l' failed.
Aborted (core dumped)

The filtering framework requires that nvalues==str_value.l. However, when the function filters_set_genotype_string() in filter.c is called for a line without the GT format, the function immediately returns:

static void filters_set_genotype_string(filter_t *flt, bcf1_t *line, token_t *tok)
{
    bcf_fmt_t *fmt = bcf_get_fmt(flt->hdr, line, "GT");
    if ( !fmt )
    {
        tok->nvalues = 0;
        return;
    }
...
    tok->nvalues = tok->str_value.l;
    tok->nval1 = blen;
}

This can cause nvalues!=str_value.l which then causes the assertion within the cmp_vector_strings() function in filter.c to fail. I am not sure what would be appropriate, but maybe it would be enough to make filters_set_genotype_string() set tok->str_value.l to zero when GT is not present

pd3 commented 6 days ago

This is fixed now. Thank you for the excellent test case!