Can I add a question to the How-to guide, which is how to perform the bcftools norm -multiallelics algorithm on a VCF stored in SGkit? Is this even possible? In particular, we have VCFs in which multialleic SNPs have been split into multiple sites all at the same position (yuck!), and it would be great to be able to get them back into a sane state without having to go through the VCF pipeline multiple times.
I don;t know if this is a reasonable thing to want to do in sgkit, however. Here's the quote from the bcftools docs:
-m, --multiallelics -|+[snps|indels|both|any]
split multiallelic sites into biallelic records (-) or join biallelic sites into multiallelic records (+). An optional type string can follow which controls variant types which should be split or merged together: If only SNP records should be split or merged, specify snps; if both SNPs and indels should be merged separately into two records, specify both; if SNPs and indels should be merged into a single record, specify any.
Can I add a question to the How-to guide, which is how to perform the
bcftools norm -multiallelics
algorithm on a VCF stored in SGkit? Is this even possible? In particular, we have VCFs in which multialleic SNPs have been split into multiple sites all at the same position (yuck!), and it would be great to be able to get them back into a sane state without having to go through the VCF pipeline multiple times.I don;t know if this is a reasonable thing to want to do in sgkit, however. Here's the quote from the bcftools docs: