vatlab / varianttools

software tool for the manipulation, annotation, selection, and analysis of variants in the context of next-gen sequencing analysis
https://vatlab.github.io/vat-docs/
GNU General Public License v3.0
31 stars 4 forks source link

Hemizygous variants in association test #54

Closed MBeyens closed 6 years ago

MBeyens commented 7 years ago

Dear

In several variants on the X-chromosome, we found a discrepancy between the allele count, as calculated using the #(alt) function in vtools update, and the raw data in the VCF file. It seems like vtools does not recognize that the variant is on the X-chromosome.

For example : in one of the variants on the X chromosome, #(alt) counted a total of 20 alternative alleles in our population. In the raw VCF file, there were 6 heterozygous females and 7 hemizygous males. In total, this should add up to 6+7=13 alternative alleles. Yet, the #(alt) counted 20 alleles.

It seems like #(alt) has disregarded the fact that the variant is on the X-chr, so that the 7 hemizygous males are counted as homozygous for the alternative allele. Instead of 7 alternative alleles in the males, this would create 14 alternative alleles in the males. Together with the 6 alternative alleles in the heterozygous females, this adds up to 20.

The error is systematic across multiple rare variants on the X-chromosome. We did the comparison of the raw VCF to the #(alt) calculation for several rare variants on X, and consistently found that the allele count from #(alt) treats the male hemizygotes as homozygotes for the alternative allele.

We checked the coding of the males and the females and this was OK. The maf() function worked properly. In the raw VCF file, no male heterozygotes were observed.

Kind regards

BoPeng commented 7 years ago

This is a limitation/bug of vtools update in that the calculation of #(alt) etc does not make use of phenotype (sex) information. The manual states that

The maf() function treats chromosomes 1 to 22 as autosomes, X and Y as sex chromosomes, and other chromosomes as single-copy manifolds.

and does not mention anything similar for other function.

To investigate potential fix for this problem, I will need to know exactly how variants on X chromsomes are encoded in your VCF file. Could you upload (pm bpeng@mdanderson.org if you prefer), for example, 1000 variants on X chromsome of your data, related phenotype file, and commands to demonstrate the problem?

BoPeng commented 6 years ago

The problem is caused by incorrect encoding of genotypes of male individuals on chromosome X (coded as 1/1), so an admin command is needed to fix the input genotype.

BoPeng commented 6 years ago

A patch has been submitted and will appear after it passes unit test. Basically, you will need to run

vtools admin --validate_sex force-heterozygote

after vtools import and vtools phenotype, before running vtools update.