Closed mbhall88 closed 3 years ago
Does this "if GT 1 and GT 2 are 2 SNPs different from each other" mean the edit distance between alleles 1 and 2 is 2?
or do you mean we have a triallelic site, so GT1 and GT2 are both SNPs different from the ref?
Yes, the edit distance between them is 2.
Right, so I'd ignore any nonSNP variant to match PHE and the field, we're doing a SNP distance of clockwise SNPs and the edit 2 stuff might have occurred in one event.
So the decision from the meeting is to try genotype distance and see how that works
One thing we spoke about when we originally discussed this @iqbal-lab was when doing the pairwise distances from the compare VCF it might be best to do this by encoding the GT field in a matrix and calculating the distance from this.
One (potential) problem I see with this approach (although this may be a feature?) is that say sample A has GT 1 and sample B has GT 2, we give them a distance of 1. However, if GT 1 and GT 2 are 2 SNPs different from each other, their distance - in the conventional sense - would be 2.