samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
634 stars 241 forks source link

Setting variant ID as the Nth record #2157

Closed ASLeonard closed 2 months ago

ASLeonard commented 2 months ago

I am interested in setting a compact and unique variant ID with bcftools annotate --set-id, where different variants likely will have the same chromosome and starting position (multiple long SV alleles). Programs like plink complain if the ID length is too long, so I can't use the "%CHROM_%POS_%REF_%ALT" which would be unique. I was able to add a counter variable and force that in here for tmpks https://github.com/samtools/bcftools/blob/466ceaebdd98acf02a7aa464f3afbcb280c0cc5a/vcfannotate.c#L3351

but I was wondering if there was a better/general way of doing this. The variant IDs could also be completely random, as long as I can make a map between "compact, unique, plink compatible IDs" and the real "%CHROM_%POS_%REF_%ALT" IDs.

Best, Alex

ASLeonard commented 2 months ago

I don't think it appears in the documentation, but it appears that these are additional possible tags, and VKX is a unique key per variant record. https://github.com/samtools/bcftools/blob/466ceaebdd98acf02a7aa464f3afbcb280c0cc5a/convert.c#L49-L81

It seems to be unique when even "%CHROM_%POS_%TYPE" is not unique due to these long insertions starting at the same coordinates.

pd3 commented 2 months ago

That is correct, VariantKey described https://github.com/tecnickcom/variantkey can be used. Note, however, it only includes the first ALT allele at multiallelic sites.