jmschrei / pomegranate

Fast, flexible and easy to use probabilistic modelling in Python.
http://pomegranate.readthedocs.org/en/latest/
MIT License
3.34k stars 588 forks source link

Compute likelihood of sequence and states under Gaussian HMM #233

Closed alimanfoo closed 7 years ago

alimanfoo commented 7 years ago

Apologies for the newbie question, but I have a Gaussian HMM for which the transition matrix and emission probabilities are known, and I would like to compute the likelihood for a particular sequence of data given a particular sequence of hidden states. The log_probability() method on the HiddenMarkovModel class says it can do this if a path is provided, but how do I specify the path?

Thanks in advance.

jmschrei commented 7 years ago

Howdy

Looks like I actually removed that feature without updating the documentation. Sorry about that!

Until I re-add it, you can use the following code to do it:

model = HiddenMarkovModel...
path = [list of state objects]
path_idxs = map(path, model.states.index)

logp = 0
# First add in emissions
j = 0
for state in path:
    if state.distribution is not None:
        logp += state.distribution.log_probability(sequence[j])
        j += 1

# Now add in the transitions
trans = numpy.log(model.dense_transition_matrix())
for start, end in zip(path_idxs[:-1], path_idxs[1:]):
    logp += trans[start, end]

print logp

Note: This is untested pseudocode I just hand-wrote, there may be some minor issues with it. Let me know if there are major issues! It may be memory-inefficient for large models since it uses the dense transition matrix instead of the internal sparse representation.

alimanfoo commented 7 years ago

Thank you, much appreciated.

On Wed, 22 Mar 2017 at 21:58, Jacob Schreiber notifications@github.com wrote:

Howdy

Looks like I actually removed that feature without updating the documentation. Sorry about that!

Until I re-add it, you can use the following pseudocode to do it:

model = HiddenMarkovModel... path = [list of state objects] path_idxs = map(path, model.states.index)

logp = 0

First add in emissions

j = 0 for state in path: if state.distribution is not None: logp += state.distribution.log_probability(sequence[j]) j += 1

Now add in the transitions

trans = numpy.log(model.dense_transition_matrix()) for start, end in zip(path_idxs[:-1], path_idxs[1:]): logp += trans[start, end]

print logp

Note: This is untested pseudocode I just hand-wrote, there may be some minor issues with it. Let me know if there are major issues! It may be memory-inefficient for large models since it uses the dense transition matrix instead of the internal sparse representation.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jmschrei/pomegranate/issues/233#issuecomment-288552893, or mute the thread https://github.com/notifications/unsubscribe-auth/AAq8Qn6WKu1MpsJWS_DOwpo7z5TYKJYmks5roZmDgaJpZM4Mlang .

-- Alistair Miles Head of Epidemiological Informatics Centre for Genomics and Global Health http://cggh.org The Wellcome Trust Centre for Human Genetics Roosevelt Drive Oxford OX3 7BN United Kingdom Email: alimanfoo@googlemail.com Web: http://purl.org/net/aliman Twitter: https://twitter.com/alimanfoo Tel: +44 (0)1865 287721