Closed ASLeonard closed 2 months ago
I don't think it appears in the documentation, but it appears that these are additional possible tags, and VKX
is a unique key per variant record.
https://github.com/samtools/bcftools/blob/466ceaebdd98acf02a7aa464f3afbcb280c0cc5a/convert.c#L49-L81
It seems to be unique when even "%CHROM_%POS_%TYPE" is not unique due to these long insertions starting at the same coordinates.
That is correct, VariantKey described https://github.com/tecnickcom/variantkey can be used. Note, however, it only includes the first ALT allele at multiallelic sites.
I am interested in setting a compact and unique variant ID with
bcftools annotate --set-id
, where different variants likely will have the same chromosome and starting position (multiple long SV alleles). Programs like plink complain if the ID length is too long, so I can't use the "%CHROM_%POS_%REF_%ALT" which would be unique. I was able to add a counter variable and force that in here fortmpks
https://github.com/samtools/bcftools/blob/466ceaebdd98acf02a7aa464f3afbcb280c0cc5a/vcfannotate.c#L3351but I was wondering if there was a better/general way of doing this. The variant IDs could also be completely random, as long as I can make a map between "compact, unique, plink compatible IDs" and the real "%CHROM_%POS_%REF_%ALT" IDs.
Best, Alex