Closed MBeyens closed 6 years ago
This is a limitation/bug of vtools update
in that the calculation of #(alt)
etc does not make use of phenotype (sex) information. The manual states that
The maf() function treats chromosomes 1 to 22 as autosomes, X and Y as sex chromosomes, and other chromosomes as single-copy manifolds.
and does not mention anything similar for other function.
To investigate potential fix for this problem, I will need to know exactly how variants on X chromsomes are encoded in your VCF file. Could you upload (pm bpeng@mdanderson.org if you prefer), for example, 1000 variants on X chromsome of your data, related phenotype file, and commands to demonstrate the problem?
The problem is caused by incorrect encoding of genotypes of male individuals on chromosome X (coded as 1/1
), so an admin command is needed to fix the input genotype.
A patch has been submitted and will appear after it passes unit test. Basically, you will need to run
vtools admin --validate_sex force-heterozygote
after vtools import
and vtools phenotype
, before running vtools update
.
Dear
In several variants on the X-chromosome, we found a discrepancy between the allele count, as calculated using the #(alt) function in vtools update, and the raw data in the VCF file. It seems like vtools does not recognize that the variant is on the X-chromosome.
For example : in one of the variants on the X chromosome, #(alt) counted a total of 20 alternative alleles in our population. In the raw VCF file, there were 6 heterozygous females and 7 hemizygous males. In total, this should add up to 6+7=13 alternative alleles. Yet, the #(alt) counted 20 alleles.
It seems like #(alt) has disregarded the fact that the variant is on the X-chr, so that the 7 hemizygous males are counted as homozygous for the alternative allele. Instead of 7 alternative alleles in the males, this would create 14 alternative alleles in the males. Together with the 6 alternative alleles in the heterozygous females, this adds up to 20.
The error is systematic across multiple rare variants on the X-chromosome. We did the comparison of the raw VCF to the #(alt) calculation for several rare variants on X, and consistently found that the allele count from #(alt) treats the male hemizygotes as homozygotes for the alternative allele.
We checked the coding of the males and the females and this was OK. The maf() function worked properly. In the raw VCF file, no male heterozygotes were observed.
Kind regards