samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
662 stars 240 forks source link

bcftools norm shifts symbolic <DEL> to position 1 without warning if the END tag is missing from VCF #2216

Open davmlaw opened 3 months ago

davmlaw commented 3 months ago

Found while testing the changes in #1919

Leaving off the "END" tag causes <DEL> symbolic alts to shift to position 1 with no warning (DUP are fine).

Sample output line:

NC_000003.11    1   .   N   <DEL>   .   PASS    SVTYPE=DEL;SVLEN=-2666;BCFTOOLS_OLD_VARIANT=NC_000003.11|128204048|G|<DEL>

Command:

bcftools norm --fasta-ref=/data/annotation/fasta/GCF_000001405.25_GRCh37.p13_genomic.fna.gz --old-rec-tag=BCFTOOLS_OLD_VARIANT del_normalize_test_no_end.GRCh37.vcf

File: del_normalize_test_no_end.GRCh37.vcf.txt

It is not clear to me from the VCF spec whether the END tag is required for symbolic variants.

an explicit END INFO field provides variant span information that is otherwise unknown. ... This field is used to compute BCF’s rlen field

Ideally, you should be able to use SVLEN to get the rlen, but if the END tag is required, it would be better to:

If it is an error or warning, it would be nice for it to be noted in bcftools view as well. Thanks!

davmlaw commented 2 months ago

FYI the END info has been deprecated in VCF 4.5

davmlaw commented 1 month ago

I think bcftools does the right thing here using rlen and instead this is a htslib issue