Open davmlaw opened 3 months ago
Found while testing the changes in #1919
Leaving off the "END" tag causes <DEL> symbolic alts to shift to position 1 with no warning (DUP are fine).
<DEL>
Sample output line:
NC_000003.11 1 . N <DEL> . PASS SVTYPE=DEL;SVLEN=-2666;BCFTOOLS_OLD_VARIANT=NC_000003.11|128204048|G|<DEL>
Command:
bcftools norm --fasta-ref=/data/annotation/fasta/GCF_000001405.25_GRCh37.p13_genomic.fna.gz --old-rec-tag=BCFTOOLS_OLD_VARIANT del_normalize_test_no_end.GRCh37.vcf
File: del_normalize_test_no_end.GRCh37.vcf.txt
It is not clear to me from the VCF spec whether the END tag is required for symbolic variants.
an explicit END INFO field provides variant span information that is otherwise unknown. ... This field is used to compute BCF’s rlen field
Ideally, you should be able to use SVLEN to get the rlen, but if the END tag is required, it would be better to:
If it is an error or warning, it would be nice for it to be noted in bcftools view as well. Thanks!
FYI the END info has been deprecated in VCF 4.5
I think bcftools does the right thing here using rlen and instead this is a htslib issue
Found while testing the changes in #1919
Leaving off the "END" tag causes
<DEL>
symbolic alts to shift to position 1 with no warning (DUP are fine).Sample output line:
Command:
File: del_normalize_test_no_end.GRCh37.vcf.txt
It is not clear to me from the VCF spec whether the END tag is required for symbolic variants.
Ideally, you should be able to use SVLEN to get the rlen, but if the END tag is required, it would be better to:
If it is an error or warning, it would be nice for it to be noted in bcftools view as well. Thanks!