seppinho / haplogrep-cmd

HaploGrep - mtDNA haplogroup classification. Supporting rCRS and RSRS.
https://haplogrep.i-med.ac.at/
MIT License
74 stars 23 forks source link

Microarrays and missing data, `--chip` #32

Open stephenturner opened 4 years ago

stephenturner commented 4 years ago

As described in the README, the --chip flag will limit the range to array SNPs only. How are missing genotypes handled in a multisample VCF file? For example, my multisample VCF file contains positions 1, 2, 3, 4, 5, but one particular sample is missing data (genotype not called) for positions 2 and 4. It appears in the results the range is still listed as 1, 2, 3, 4, 5 for all samples, even the one that was missing data. Is this the expected behavior? Desired behavior? I could limit the range to 1, 3, 5 for this particular sample by writing a new VCF extracting just this sample and removing missing genotypes from the VCF altogether, but that wouldn't allow me to analyze multiple samples at once, each missing a particular subset of the 5 positions on my hypothetical array. cc @vpnagraj @cneal13