samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
663 stars 240 forks source link

Question on subscripting arrays of Number=R tags by alleles found in FORMAT/GT #2133

Closed acorvelo closed 6 months ago

acorvelo commented 6 months ago

Starting in v1.17, it should be possible to subscript arrays of Number=R with the alleles found in FORMAT/GT in filtering expressions, which is a very useful feature. However, either it doesn't work as intended or I'm misunderstanding the following manual sentence and/or use case:

in addition to array subscripts shown above, it is possible to subscript arrays of Number=R tags by alleles found in FORMAT/GT (starting with version 1.17).

Consider the example:

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=chr1,length=248956422>
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=TC,Number=R,Type=Character,Description=".">
##FORMAT=<ID=TS,Number=R,Type=Integer,Description=".">
##bcftools_viewVersion=1.19+htslib-1.19
##bcftools_viewCommand=view -i 'FORMAT/TS[*:GT] != 9'; Date=Tue Mar 19 17:43:41 2024
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  SAMPLE
chr1    14542668    .   CAA C,CA    .   .   .   GT:TC:TS    2/2:T,N,A:18,0,9

The bcftools command listed in meta should've not produced the listed record, given that the only allele listed in GT is 2 and the corresponding TS value is 9.

Interestingly, FORMAT/TS[*:GT] < 9 and FORMAT/TS[*:GT] > 9, produce no records - which is the intended behavior.

I also can't make it work properly with Type=Float, Type=Character and Type=String fields.

Any help with clarifying this issue would be great. Thanks.

pd3 commented 6 months ago

Hopefully this should be fixed by the commit https://github.com/samtools/bcftools/commit/c7628e8662489a0284a02867ef8e87d686b2a9d7, please try it out. Thanks for the issue and the test case