samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
662 stars 240 forks source link

Substring matching for variant IDs to aid in filtering records with multiple IDs? #2190

Closed rajwanir closed 4 months ago

rajwanir commented 4 months ago

Hello,

I have a vcf file that has some variants assigned multiple IDs. For example:

chr11_KI270721v1_random 78026 c11_pos2000643;rs72849239 G A .PASS . GT:GQ 0/0:5 ./.:. 0/0:4 ./.:. ./.:. ./.:. ./.:../.:. ./.:. ./.:.

I wish to query this vcf with a file containing snp ids. This works fine except that variants with multiple identifiers are not included in the output. To illustrate the issue:

  1. A 'testid' file with c11_pos2000643.outputs nothing with the following command: bcftools query -f"%POS" -i'ID=@testid' merged.vcf.gz

  2. No output with direct ID equal comparision: bcftools query -f"%POS" -i'ID=="c11_pos2000643"' merged.vcf.gz

  3. This works and returns the position: bcftools query -f"%POS" -i'ID~"c11_pos2000643"' merged.vcf.gz

Essentially, is it possible to use similarity/subset operator ~ with a snp list? If not, how I may be able to query the records with multiple IDs?

I am working with Bcftools version 1.20

Thank you.

pd3 commented 4 months ago

This is now fixed and all possibilities should work as intuitively expected, please try it out