This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
I'm not clear if this is expected behavior or a bug.
This is a test region of chr22 with a selected complex polyallelic variant SNP + indels
bcftools 1.20, file formats are VCF 4.2.
$BCFTOOLS -v
bcftools 1.20
Using htslib 1.20
Copyright (C) 2024 Genome Research Ltd.
License Expat: The MIT/Expat license
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Individual whole genome gvcfs were generated from DRAGEN. Attempting to merge 1379 gcvf files using the following pipeline (same samples as #1891) :
In the resulting bcf file I find the following two variants at the same location. There may be other other complex variants but this is an exemplar. Since there are 1379 samples, just showing the first sample (first 10 cols):
If compare the samples that are contributing to variant 1 and variant 2 (non missing genotype/format entry) there are 37/1379 that have a non-missing genotype in both variants. The majority are non-missing in one or the other.
So question:
Should I see only one variant entry using --merge both or --merge both,* (max alleles doesn't appear to be exceeded here).
?
If not, then what attributes force the variants to remain separate ?
I'm not clear if this is expected behavior or a bug. This is a test region of chr22 with a selected complex polyallelic variant SNP + indels
bcftools 1.20, file formats are VCF 4.2.
Individual whole genome gvcfs were generated from DRAGEN. Attempting to merge 1379 gcvf files using the following pipeline (same samples as #1891) :
In the resulting bcf file I find the following two variants at the same location. There may be other other complex variants but this is an exemplar. Since there are 1379 samples, just showing the first sample (first 10 cols):
if I repeat but use --merge both,* I still see the two variants but the only diff seems to be that the alleles have been removed:
If compare the samples that are contributing to variant 1 and variant 2 (non missing genotype/format entry) there are 37/1379 that have a non-missing genotype in both variants. The majority are non-missing in one or the other.
So question: Should I see only one variant entry using --merge both or --merge both,* (max alleles doesn't appear to be exceeded here). ?
If not, then what attributes force the variants to remain separate ?
Thanks very much,