szhan / onekg_analysis

Evaluation of genotype imputation methods using the unified genealogy dataset
MIT License
0 stars 0 forks source link

Test sensitivity of sample matching to exact values of mismatch probabilities and switch probabilities #5

Closed szhan closed 1 year ago

szhan commented 1 year ago

Right now, small values of 1e-8 and 1e-20 are used for mismatch and switch probabilities, respectively. A previous preliminary analysis shows that precision set to get HMM path likelihood values can affect results (going from 10 to 20, noticeably). So, even the set values of mismatch and switch probabilities may have an effect. It would be good to check by setting the mismatch and switch probabilities to, say, 1e-6 and 1e-18, respectively, or possibly higher. It should be quick to test it on a handful of sample genomes anyway.

szhan commented 1 year ago

It turns out that MMR is not simply rho / mu. The relationship is more complicated, and it is explained a bit in #20. I did an experiment where I set mutation rate to 1e-8 and 1e-7 and ran the matching using MMR using the right formula. I then compared the imputed alleles from the experiments. I obtained more paths that are identical (in terms of the number of parent node ids correct) then simply maintaining rho / mu. There are, however, still some discrepancies, which may be due to numerical instability.