jmschrei / pomegranate

Fast, flexible and easy to use probabilistic modelling in Python.
http://pomegranate.readthedocs.org/en/latest/
MIT License
3.29k stars 591 forks source link

HMM state probabilities #995

Closed malonzm1 closed 1 year ago

malonzm1 commented 1 year ago

Hi!

Is there a way to find the state probabilities for each step in the sequence?

Thanks and good day.

teoML commented 1 year ago

Hi, if you are talking about an HMM, you can use predict_proba(), and each component of the vector corresponds to the stateid. This uses the forward-backward algorithm. If you want to use only the forward, you can use hmm_model.forward().

malonzm1 commented 1 year ago

Thanks. I tried predict_proba() but when I took a look at the selected states they don't necessarily correspond to the highest probabilities (though most do). What am I missing?

Thanks.

jmschrei commented 1 year ago

I'm not sure if you're saying that the selected states from model.predict don't match the highest probabilities from model.predict_proba, or if the highest probability states in model.predict_proba don't match the highest probability states in model.forward, so I'll answer both.

First, the forward algorithm begins at the start of the sequence, aligning observations to states in the model. Each probability in the returned matrix is the probability of starting at the beginning of the sequencing and aligning observations to any state in the model, over any path through the model, to eventually align this observation to this state. The backward algorithm works much the same way, except it begins by aligning the final observation to the end state and goes backwards from there. The forward_backward algorithm, wrapped by predict_proba, combines these probabilities and then normalizes them per-observation. It's basically saying, "given all paths of aligning observations to states up until this point, and all paths aligning observations to states after this point, what state is most likely for this observation?" It has information that the forward algorithm does not have access to.

Second, the algorithm in model.predict is the Viterbi algorithm, which is returning the maximum likelihood single path through the model, whereas model.predict_proba is returning probabilities from the forward-backward algorithm.

malonzm1 commented 1 year ago

Thanks! It is the first, the selected states from model.predict don't match the highest probabilities from model.predict_proba. I don't suppose there's a similar model.predict_proba function for model.predict that outputs state probabilities for the Viterbi algorithm?

jmschrei commented 1 year ago

Thank you for opening an issue. pomegranate has recently been rewritten from the ground up to use PyTorch instead of Cython (v1.0.0), and so all issues are being closed as they are likely out of date. Please re-open or start a new issue if a related issue is still present in the new codebase.