This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
I'm running into a segmentation fault when using the new INFO/TAG=@file.txt filtering feature. Running bcftools v1.19 - which is where this new feature became available, testing within a singularity environment using the image hosted on quay.io as well as self built dockerfile/compilation.
Example of the issue, given the following minimal vcf:
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (informative and non-informative); some reads may have been filtered based on mapq etc.">
##INFO=<ID=TAG,Number=1,Type=String,Description="This is an example of a string in info.">
##contig=<ID=chr1,length=248956422>
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 11558102 . G GT . PASS DP=61
chr1 11558105 . G C . PASS DP=61;TAG=Example
chr1 11558108 . A T . PASS DP=61;TAG=It
And a short file of desired strings:
$ cat strings_expected
Example
Something
The following bcftools command hits a segmentation fault:
$ gdb --args bcftools view --include 'INFO/TAG=@strings_expected' example.vcf
GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from bcftools...
(gdb) run
Starting program: /usr/local/bin/bcftools view --include INFO/TAG=@strings_expected example.vcf
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.
kh_get_str2int (key=0x0, h=0x5555557b4250) at htslib-1.19/htslib/khash_str2int.h:30
30 htslib-1.19/htslib/khash_str2int.h: No such file or directory.
(gdb) bt
#0 kh_get_str2int (key=0x0, h=0x5555557b4250) at htslib-1.19/htslib/khash_str2int.h:30
#1 khash_str2int_has_key (str=0x0, _hash=0x5555557b4250) at htslib-1.19/htslib/khash_str2int.h:69
#2 filters_cmp_string_hash (atok=<optimized out>, btok=0x5555557b4280, rtok=0x5555557b4400, line=0x5555557b4bc0) at filter.c:616
#3 0x0000555555597b6b in filter_test (filter=<optimized out>, line=line@entry=0x5555557b4bc0, samples=samples@entry=0x0) at filter.c:3905
#4 0x00005555555b0b87 in subset_vcf (args=0x555555788240, line=0x5555557b4bc0) at vcfview.c:334
#5 0x00005555555b2be5 in subset_vcf (line=0x5555557b4bc0, args=0x555555788240) at vcfview.c:318
#6 main_vcfview (argc=<optimized out>, argv=<optimized out>) at vcfview.c:819
#7 0x00007ffff75f5d0a in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#8 0x00005555555690ba in _start ()
If TAG exists in the first record, then we do not have this issue:
$ cat example_no_segfault.vcf
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (informative and non-informative); some reads may have been filtered based on mapq etc.">
##INFO=<ID=TAG,Number=1,Type=String,Description="This is an example of a string in info.">
##contig=<ID=chr1,length=248956422>
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 11558102 . G GT . PASS DP=50;TAG=Example
chr1 11558105 . G C . PASS DP=50
chr1 11558108 . A T . PASS DP=51;TAG=It
$ bcftools view --include 'INFO/TAG=@strings_expected' example_no_segfault.vcf
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (informative and non-informative); some reads may have been filtered based on mapq etc.">
##INFO=<ID=TAG,Number=1,Type=String,Description="This is an example of a string in info.">
##contig=<ID=chr1,length=248956422>
##bcftools_viewVersion=1.19+htslib-1.19
##bcftools_viewCommand=view --include INFO/TAG=@strings_expected example_no_segfault.vcf; Date=Wed Feb 28 22:21:13 2024
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 11558102 . G GT . PASS DP=50;TAG=Example
chr1 11558105 . G C . PASS DP=50
Notably this is including a variant record that does not have the INFO/TAG, but I believe this might have been observed previously and is expected behavior, or at least the workaround is --include 'INFO/TAG=@strings_expected & INFO/TAG!="."'
I've noticed a similar segfault behavior if the input file contains only integers - though the release notes do state that INFO/TAG=@file.txt supports only strings, so the weird behavior is understandable. Let me know if a separate issue should be raised or if we'd like a minimal example for testing.
I'm running into a segmentation fault when using the new
INFO/TAG=@file.txt
filtering feature. Running bcftools v1.19 - which is where this new feature became available, testing within a singularity environment using the image hosted on quay.io as well as self built dockerfile/compilation.Example of the issue, given the following minimal vcf:
And a short file of desired strings:
The following bcftools command hits a segmentation fault:
Adding a trace via gdb yields:
If
TAG
exists in the first record, then we do not have this issue:Notably this is including a variant record that does not have the INFO/TAG, but I believe this might have been observed previously and is expected behavior, or at least the workaround is
--include 'INFO/TAG=@strings_expected & INFO/TAG!="."'
I've noticed a similar segfault behavior if the input file contains only integers - though the release notes do state that
INFO/TAG=@file.txt
supports only strings, so the weird behavior is understandable. Let me know if a separate issue should be raised or if we'd like a minimal example for testing.