samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
641 stars 241 forks source link

Annotate fail to add INFO when match structural variation by POS,~ID,REF,ALT #1739

Open Han-Cao opened 2 years ago

Han-Cao commented 2 years ago

Hi,

I am using bcftools 1.15.1. I want to add INFO for a vcf file of structural variations. The vcf is like:

chr1    10627   ID1 N   <INS>   .   PASS    some_INFO
chr1    90238   ID2 N   <INS>   .   PASS    some_INFO
chr1    90337   ID3 N   <INS>   .   PASS    some_INFO
chr1    90388   ID4 N   <INS>   .   PASS    some_INFO
chr1    90412   ID5 N   <INS>   .   PASS    some_INFO
chr1    90416   ID6 N   <INS>   .   PASS    some_INFO

The annotation file is like:

chr1    10627   ID1 N   <INS>   -0.999642
chr1    90238   ID2 N   <INS>   1.25927
chr1    90337   ID3 N   <INS>   1.42278
chr1    90388   ID4 N   <INS>   1.14841
chr1    90412   ID5 N   <INS>   1.19886
chr1    90416   ID6 N   <INS>   -0.556859

When I first try to annotate by POS,~ID,REF,ALT suggested by the manual (bcftools annotate -a annots.tab.gz -c CHROM,POS,~ID,REF,ALT,INFO/VAL input.vcf), no value can be added to the INFO.

However, after I remove the REF and ALT columns in the annotation file and annotate by -c CHROM,POS,~ID,INFO/VAL, I can get the expected output.

By the way, I first try to annotate from vcf, where I found todo: -c ~ID with -a VCF?, does it mean this feature is on the todo list and currently we can only annotate by tsv file?

Thanks, Han

pd3 commented 1 year ago

Hi, yes, that's correct. This has not been done for the case when annotating from a VCF yet

tobiasrausch commented 9 months ago

Matching with symbolic ALTs or based on IDs would indeed be very useful for structural variants: bcftools annotate --pair-logic id ...

Thanks!

maggs-x commented 3 weeks ago

Hi. I have a vcf that has SNPs and structural variants. The vcf format is very simple. There is no type in the INFO column and we are inferring structural insertions and deletions based on the length of the ALT allele compared to the REF. I noticed I get differing results from bcftools annotate depending on if I annotate a vcf that has multiallelic sites joined vs a file that does not have multiallelic sites joined. For example, there are fewer genes annotated in the vcf that has multiallelic sites joined.

I've been unable to determine if the problem is related to how 'bcftools annotate' handles multiallelic sites, or if it is how it handles structural insertions and deletions (50bp-100kbp). Please let me know. Thank you.

Maggs