There are some errors in the decomposed VCFs, which is likely caused by vcfwave:
Some of the decomposed variants have incorrect genotypes and INFO
The decomposed INFO/AT do not match the alleles. It looks like vcfwave treat this field as Number=A instead of Number=R
To quickly replicate this issue:
bcftools view -H -r chr22:11021446-11021447 -s HG01109 hprc-v1.1-mc-chm13.raw.vcf.gz
chr22 11021446 >55070592>55070595 TT T,TTT 60 . AC=1,0;AF=0.0857143,0.0571429;AN=2;AT=>55070592>55070593>55070595,>55070592>55070595,>55070592>55070593>55070594>55070595;NS=44;LV=0 GT 0|1
bcftools view -H -r chr22:11021446-11021447 -s HG01109 hprc-v1.1-mc-chm13.vcfbub.a100k.wave.vcf.gz
[W::bcf_calc_ac] Incorrect number of AC fields at chr22:11021447. (This message is printed only once.)
chr22 11021446 >55070592>55070595_1 TT T 60 . AC=1;AF=0.085714;AN=2;AT=>55070592>55070593>55070595;NS=44;LV=0;ORIGIN=chr22:11021446;LEN=1;TYPE=del GT 0|1
chr22 11021447 >55070592>55070595_2 T TT 60 . AC=1;AF=0.0857143,0.0571429;AN=2;AT=>55070592>55070593>55070595,>55070592>55070595,>55070592>55070593>55070594>55070595;NS=44;LV=0;ORIGIN=chr22:11021446;LEN=1;TYPE=ins GT 0|1
Genotype error:
The >55070592>55070595 record has 2 alternative alleles, the first one is deletion (TT -> T) and the second one is insertion (TT -> TTT). In raw VCF, HG01109 has one copy of the deletion on haplotype2. While in the decomposed VCF, HG01109 has both the deletion and insertion on haplotype2.
INFO error
The AT in decomposed VCF don't match the input VCF. Moreover, it seems all INFO for >55070592>55070595_2 are still the same as the raw VCF.
The error records are reported in graph.vcf.log like below. In chr22, there are 3111 errors.
Two overlapping variants at same haplotype at chr22:11021446, set allele to missing.
Two overlapping variants at same haplotype at chr22:11023348, set allele to missing.
Two overlapping variants at same haplotype at chr22:11028129, set allele to missing.
...
Hi @glennhickey ,
There are some errors in the decomposed VCFs, which is likely caused by vcfwave:
INFO
INFO/AT
do not match the alleles. It looks like vcfwave treat this field asNumber=A
instead ofNumber=R
To quickly replicate this issue:
Genotype error: The
>55070592>55070595
record has 2 alternative alleles, the first one is deletion (TT -> T) and the second one is insertion (TT -> TTT). In raw VCF, HG01109 has one copy of the deletion on haplotype2. While in the decomposed VCF, HG01109 has both the deletion and insertion on haplotype2.INFO error The
AT
in decomposed VCF don't match the input VCF. Moreover, it seems all INFO for>55070592>55070595_2
are still the same as the raw VCF.I also report this issue to vcfwave, you may refer to https://github.com/vcflib/vcflib/issues/403 for details .
Besides, this issue was initially identified when I tried to run this Pangenie pipeline.
The error records are reported in
graph.vcf.log
like below. In chr22, there are 3111 errors.