zhanxw / rvtests

Rare variant test software for next generation sequencing data
129 stars 41 forks source link

Chromosome Y Data #96

Open KerryAnderson opened 4 years ago

KerryAnderson commented 4 years ago

Attempting to analyse haploid chomosome Y SNPs seems to have resulted in an additional copy of the reference allele being added on to each genotype in order to ensure the analysis is carried out on "diploid" results.

For example, where G is the reference allele and A is the alternative, an individual with a single G genotype will be labelled GG in the outputs and an individual with an A genotype will be labelled GA in the outputs. The only options are homozygous for the reference or heterozygous, with the homozygous for the alternative allele column showing 0:0:0.

The --set-hh-missing flag was used when generating the output from PLINK and examination of the .vcf input to Rvtests shows that the alleles are haploid in this file also.

My main question is: are my results still valid even if an additional reference allele is being added to the genotype and if not, is there anything I can do to stop the software "inferring" this "missing" allele?

Thanks