jmschrei / pomegranate

Fast, flexible and easy to use probabilistic modelling in Python.
http://pomegranate.readthedocs.org/en/latest/
MIT License
3.35k stars 589 forks source link

Implementing an HMM with GMM emissions #981

Closed teoML closed 1 year ago

teoML commented 2 years ago

Dear @jmschrei , thanks for creating this cool library! I have a problem with understanding how to create a GMM-HMM .

In my dataset for each timestep I have 4 sensor measurements (basically the observation) and they "generate" 3 different states (A1,A2,A3). I want to train an HMM (by using labeled training) and I want the emissions to be represented by a Gaussian Mixture Model since the sensor measurements don't seem to fit normal or any other known distribution. In terms of code, I tried out this approach (you can run it on google colab using this link https://colab.research.google.com/drive/10NF6xrCzuoQs7atH1hewhqhkiABFU61Y?usp=sharing ):

obs_seq = np.array([[[0.4, 0.32, 0.56, 0.7],[0.4, 0.82, 0.96, 0.47],[0.43, 0.12, 0.56, 0.27],[0.4, 0.9, 0.46, 0.1],[0.2, 0.32, 0.36, 0.1],[0.14, 0.267, 0.68, 0.57], [0.34, 0.762, 0.76, 0.73], [0.4, 0.22, 0.56, 0.47], [0.43, 0.12, 0.56, 0.27], [0.24, 0.19, 0.84, 0.1], [0.22, 0.32, 0.61, 0.7], [0.94, 0.234, 0.83, 0.77],
           [0.34, 0.52, 0.89, 0.4],[0.9, 0.72, 0.56, 0.17],[0.43, 0.12, 0.56, 0.27], [0.64, 0.69, 0.48, 0.1],[0.25, 0.362, 0.16, 0.6],[0.34, 0.214, 0.18, 0.67],
           [0.64, 0.72, 0.77, 0.1],[0.3, 0.62, 0.76, 0.37],[0.43, 0.12, 0.56, 0.27],[0.74, 0.52, 0.96, 0.1],[0.22, 0.342, 0.46, 0.5],[0.54, 0.63, 0.67, 0.27],
           [0.14, 0.38, 0.26, 0.5],[0.5, 0.52, 0.12, 0.657],[0.43, 0.12, 0.56, 0.27],[0.33, 0.26, 0.93, 0.1],[0.432, 0.32, 0.66, 0.3],[0.74, 0.07, 0.43, 0.47],
           [0.24, 0.22, 0.36, 0.6],[0.67, 0.32, 0.16, 0.26],[0.43, 0.12, 0.56, 0.27],[0.67, 0.22, 0.90, 0.1],[0.22, 0.314, 0.42, 0.2],[0.84, 0.17, 0.13, 0.67]]])

obs_states = np.array([["A1", "A3", "A1", "A1", "A1", "A3", 
              "A3", "A2", "A2", "A1", "A3", "A1",
              "A3", "A3", "A1", "A1", "A1", "A3", 
              "A2", "A2", "A1", "A1", "A3", "A1",
              "A2", "A3", "A1", "A1", "A1", "A3", 
              "A2", "A2", "A1", "A1", "A3", "A2",
              ]])
states_names = ["A2", "A1", "A3"]

values_feature1 = obs_seq[0][:,0]
values_f1 = values_feature1.reshape( values_feature1.shape[0], 1 )

values_feature2 = obs_seq[0][:,1]
values_f2 = values_feature2.reshape( values_feature2.shape[0], 1 )

values_feature3 = obs_seq[0][:,2]
values_f3 = values_feature3.reshape( values_feature3.shape[0], 1 )

values_feature4 = obs_seq[0][:,3]
values_f4 = values_feature4.reshape( values_feature4.shape[0], 1 )

GMM_f1 = pg.GeneralMixtureModel.from_samples(pg.MultivariateGaussianDistribution, 5, values_f1)
GMM_f2 = pg.GeneralMixtureModel.from_samples(pg.MultivariateGaussianDistribution, 5, values_f2)
GMM_f3 = pg.GeneralMixtureModel.from_samples(pg.MultivariateGaussianDistribution, 5, values_f3)
GMM_f4 = pg.GeneralMixtureModel.from_samples(pg.MultivariateGaussianDistribution, 5, values_f4)

distribs = [GMM_f1, GMM_f2, GMM_f3, GMM_f4]

model = pg.HiddenMarkovModel.from_samples(distribs,
                                          n_components = 3,
                                          state_names = states_names,
                                          X = obs_seq, 
                                          labels= obs_states,
                                          algorithm='labeled',
                                          random_state = 40
                                          )

So, as you can see, I train 4 different GMMs for each of my features (sensor measurements). Now, I want to build an HMM which uses these Distributions for the emission probabilities. However, when running the code above I get an error: TypeError: from_samples() takes at least 3 positional arguments (2 given)

Can you explain how to correctly train a GMM-HMM for my case with my data? Thank you!

jmschrei commented 2 years ago

Unfortunately, you'll need to write out the HMM by hand (i.e., using the add_edge and add_state attributes) and pass in the mixture models when specifying the States objects. See the HMM tutorials. The from_samples method only works with base distributions for now.

teoML commented 2 years ago

Unfortunately, you'll need to write out the HMM by hand (i.e., using the add_edge and add_state attributes) and pass in the mixture models when specifying the States objects. See the HMM tutorials. The from_samples method only works with base distributions for now.

Which tutorial exactly? Is there an example with GMM-HMM somewhere - I looked for it and could not find it. Thank you!

jmschrei commented 2 years ago

Sorry, I meant more that you can see a tutorial for how to write out a HMM using the add_edge and add_state methods.

al33m501 commented 1 year ago

@teoML HI! Do you solve the problem?

jmschrei commented 1 year ago

Thank you for opening an issue. pomegranate has recently been rewritten from the ground up to use PyTorch instead of Cython (v1.0.0), and so all issues are being closed as they are likely out of date. Please re-open or start a new issue if a related issue is still present in the new codebase.