jmschrei / pomegranate

Fast, flexible and easy to use probabilistic modelling in Python.
http://pomegranate.readthedocs.org/en/latest/
MIT License
3.36k stars 587 forks source link

HMM question about general initialization of transition matrix, emission matrix, .. etc #190

Closed guerrajorge closed 8 years ago

guerrajorge commented 8 years ago

Hello Jacob,

I was wondering if there was an update on the implementation of kmeans, etc to initialize and update the state distribution, transition and emission matrices, etc? Something along the lines of what was discussed in #143.

Thank you.

jmschrei commented 8 years ago

Howdy

Unfortunately no. I've been working on Bayesian networks recently, which were underloved but of intense interest to a lot of people. I'll ping you when I make progress on that front though-- and HMM structure learning is high on my list of things to add in.

guerrajorge commented 8 years ago

I understand and great work!

B4marc commented 6 years ago

Hi @, Can someone tell me, how the emission matrix is defined if you use a hmm with MultivariateGaussianDistribution? Is there a way to print the defined emission matrix from a „form_sample“ defined hmm-model (like model.dense_transition_matrix() for the transition matrix)? I would like to create a very simplified hmm-model where each observation has a probability of 1 (to be find in one special state). Is this possible to define? All the best, Marc

jmschrei commented 6 years ago

Howdy

I'm not entirely sure what you're asking. If you want to build a MultivariateGaussianDistribution you can pass in the vector of means and the covariance matrix, like MultivariateGaussianDistribution(mu, cov). If you want to print it, it's more difficult, but typically can be done by iterating over each distribution in the model and printing it out. If this is a HMM, you can go [state.distribution for state in model.states].

In the model you'd like to make, are your observations continuous values (like 7.4) or symbols (like 'E')?

B4marc commented 6 years ago

Hi jmschrei, at #157 you have already clearified my understanding about the hmm regarding hmm components and the calculation of the probabilities. As you mentioned the two components of a hmm are the emission matrix and the transition matrix. My comment in this issue is about the emission matrix. My obersvations are continuous values with 4 dimensions and 3 states.

Is there a way to print the defined emission matrix from a „form_sample“ defined hmm-model (like model.dense_transition_matrix() for the transition matrix)?

I guess it is the forward_backward algorithm, I was searching for. To explain myself more specific what i meant by asking:

I would like to create a very simplified hmm-model where each observation has a probability of 1 (to be find in one special state).

I am going to use the "model.predict_proba(sequence)" function.

If I use this function (inserting a sequence of 500 observations) with my hmm with 3 states (each state is definied as a MultivariateGaussianDistribution), I will get an emission matrix of 500x3 probabilities. Now I would like to change this emission matrix to a matrix where just one state at every single observation has nearly 100%. Is this goal reachable by detecting the most significant emission dimension for each state and define each MultivariateGaussianDistribution with just this state specific dimension? Or do I have to use the IndependentComponentsDistribution with 1 non-zeroed (out of 4, whereby 3 of them are zeroed) NormalDimension/ Uniformdistributions?

I know that this probably sounds strange and awakes the question, why I am using an hmm at all. But I would like to detected the impact of my emission matrix and transition matrix regarding the prediction accurancy.

jmschrei commented 6 years ago

If the question you're asking is how do you make the components of the model fit the data better, the answer is going to be heavily dependent on the data. I would expect that adding more parameters would help, and so a multivariate Gaussian distribution would help more than an independent components distribution. Why do you think that such a thing is even possible, given the data? If I saw results like that I'd think that the model was severely overfitting.

B4marc commented 6 years ago

Mainly, because I tried out Naive Bayes Classifier and it is working out better than my current hmm. Regarding the Distribution, UniformDistribution fitted the best with Naive Bayes Classifier. Second best was LognormalDistribution. I tested the data for multivariate Gaussian distribution and the test was rejecting it (I better open up a new case for more). The result is totally matching with most of the visualisations of the data. Thus, that is why I came up with the idea:

Is this goal reachable by detecting the most significant emission dimension for each state and define each MultivariateGaussianDistribution with just this state specific dimension? Or do I have to use the IndependentComponentsDistribution with 1 non-zeroed (out of 4, whereby 3 of them are zeroed) NormalDimension/ Uniformdistributions?

I actually asked for a IndependentComponentsDistributionsince I would like to test a mulitvariat LogNormDistribution/Uniformdistributions.

Nevertheless, thank you very much for your help!