jmschrei / pomegranate

Fast, flexible and easy to use probabilistic modelling in Python.
http://pomegranate.readthedocs.org/en/latest/
MIT License
3.29k stars 591 forks source link

Which HMM method returns a result for "What is the probability that the model generated the sequence" ? #994

Closed teoML closed 1 year ago

teoML commented 1 year ago

Hi, I trained several HMMs and now I am looking for the method which gives me the probability of a trained HMM model generating a given sequence. My intention is to find out which model most probably generated the given the sequence. Which method to use to achieve this?

jmschrei commented 1 year ago

model.probability will give you P(sequence|model). Using Bayes rule you can just calculate model.probability for each HMM and choose the one with the highest probability.

teoML commented 1 year ago

Thanks for your reply @jmschrei ! In order to avoid confusion: I want to calculate P(sequence|model) , but the sequence is not the symbols itself (not a state symbol), but a vector of sensor observation, so basically the input to the function will be just a feature vector such as [ [0.4, 0.3,0.4], [0.5, 0.6, 0.1] ... ] (the same format as the sequence used for training the HMM from samples). From what I understand from the documentation, the input argument for model.probability() is a sequence of symbols, although I think that the documentation is not that good in that case :

probability()

Return the probability of the given symbol under this distribution.

Parameters

    symbolobject

        The symbol to calculate the probability of

Returns

    probabilitydouble

        The probability of that point under the distribution.

For me it is not clear what "Return the probability of the given symbol under this distribution." means. What is a symbol ? And which distribution?

Is there such a method implemented or I will first have to calculate the predicted sequence of symbols by using predict() and then use the resulted sequence of symbols as an input to model.probability() ?

By the way, Is model.log_probability()doing the same as model.probability() but returning the log ? I tried running np.exp() to the result of it and it is not showing the same number as the output of the model.probability(seq).

teoML commented 1 year ago

@jmschrei , by the way, I tried creating a sequence with length = 1000 using the hmm.sample(length=1000) method and after calling hmm_model.probability(input_sampled_seq) I get a probability of 0 - why is that possible?

jmschrei commented 1 year ago

Thanks for your reply @jmschrei ! In order to avoid confusion: I want to calculate P(sequence|model) , but the sequence is not the symbols itself (not a state symbol), but a vector of sensor observation, so basically the input to the function will be just a feature vector such as [ [0.4, 0.3,0.4], [0.5, 0.6, 0.1] ... ] (the same format as the sequence used for training the HMM from samples). From what I understand from the documentation, the input argument for model.probability() is a sequence of symbols, although I think that the documentation is not that good in that case :

probability()

Return the probability of the given symbol under this distribution.

Parameters

    symbolobject

        The symbol to calculate the probability of

Returns

    probabilitydouble

        The probability of that point under the distribution.

For me it is not clear what "Return the probability of the given symbol under this distribution." means. What is a symbol ? And which distribution?

Sorry for the confusion. A "symbol" is just an observation, e.g., your vector of measurements. It is not the state assignment. There is no state assignment for each observation for this method -- it is calculating the sum-of-all-paths through the model log probability. I don't know where you got that documentation.

Is there such a method implemented or I will first have to calculate the predicted sequence of symbols by using predict() and then use the resulted sequence of symbols as an input to model.probability() ?

You can use model.probability(X) where X is your observed sequence.

By the way, Is model.log_probability()doing the same as model.probability() but returning the log ? I tried running np.exp() to the result of it and it is not showing the same number as the output of the model.probability(seq).

Yes. You can see that here: https://github.com/jmschrei/pomegranate/blob/master/pomegranate/base.pyx#L180