odelaneau / shapeit5

Segmented HAPlotype Estimation and Imputation Tool
https://odelaneau.github.io/shapeit5/
MIT License
61 stars 9 forks source link

Getting the posterior probability distribution of haplotype estimations? #38

Closed EmiliaCXY closed 1 year ago

EmiliaCXY commented 1 year ago

Hello! Thanks for developing such a great tool!

I was wondering if it would be possible to get the posterior probability distribution of each SNP's haplotype estimation. I am trying to calculate a haplotype frequency based on a set of phased SNPs. I read that phasing is done by sampling a posterior distribution P(D|H) based on an LSM, and I think it'd be beneficial to incorporate the probability of phasing into my calculation.

I understand the probability estimation is an intermediate step that you might not have outputted, but I was wondering if you had any insights on how I might be able to access this and / or if my thought makes sense to you.

Thank you so much for your time and efforts!

Best, Emilia

odelaneau commented 1 year ago

This information is unfortunately not provided by shapeit. It's actually quite difficult to get due to the way computations are done.

In shapeit4, you can actually output some intermediate data structure (called the graphs) that might help you with this. The code is in there https://github.com/odelaneau/shapeit4/tree/master/tools/bingraphsample, but it is not documented. Email Robin Hofmeister at UNIL, I know he has scripts to run all this.