jmschrei / pomegranate

Fast, flexible and easy to use probabilistic modelling in Python.
http://pomegranate.readthedocs.org/en/latest/
MIT License
3.29k stars 591 forks source link

How to do prediction via Model.predict(X) #985

Closed KevinHHHHH0325 closed 1 year ago

KevinHHHHH0325 commented 1 year ago

Hi Jacob, I sincerely hope I can hear from you. I have been struggled with this model for almost a month. I just can't figure it out. And I believe my issue is a little bit more complex than others.

I am trying to use the HMM to do a prediction, so I will briefly introduce my case first. I got a data set, which contains 8781 sequences. And each sequence contains 48 rows. My label, which is the state in HMM, contains 94 states. And the training data, contains 73 features. That means the shape of my initial data set would be (8781, 48, 73) for the features, and (8781, 48, 1) for the label. I create the emission probability and transition probability before I create the model. The shape of my transition probability is (94, 94). The shape of emission probability is (73, 94), which represents the emission probability from each features to each states. Then I use couple of for loops to create the model. After baking the model, I fit the model with 70% of my dataset, including both features and label. Their shape are (6146, 48, 73) and (6146, 48, 1). After fitting the model, I did the prediction with Model.predict(). I use the rest of the data as testing data. However, I got 735 results (out of 2635 in total) show: ' Warning: Sequence is impossible'. Besides, all the result I got are numbers. I don't understand the meaning of those numbers. My label, or state, are all categories. What does those numbers represent? And last but not the least, those "valid" prediction are very strange. I got 1900 results (except those impossible sequences), Most of them are identical number. For example, the first result of the prediction is [67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67]. The length of this result should be right, which is 48. But the value doesn't make sense. It should be different (as different state). Can you help me with all of above? Thank you very much!! And please let me know if you need more information!

!pip install pomegranate
import pomegranate as p
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split

model = p.HiddenMarkovModel()

states_probs = [0] * len(label_list)
for i in range(len(states_probs)):
  states_probs1 = [0] * 73
  for j in range(len(emission_prob_dict)):
    states_probs1[j] = p.DiscreteDistribution(emission_prob_dict[j][i])
  states_probs[i] = p.IndependentComponentsDistribution(states_probs1)

states = [0] * len(label_list)
for k in range(len(states)):
  states[k] = p.State(states_probs[k], name = label_list[k])

for l in range(len(states)):
  model.add_states(states[l])

start_transition_prob = []
c = df[13].value_counts()

for m in range(len(label_list)):
  for n in range(len(c)):
    if label_list[m] == c.index[n]:
       stp = c[n] / len(df[13])
       start_transition_prob.append(stp)

for o in range(len(label_list)):
  model.add_transition(model.start, states[o], start_transition_prob[o])
for q in range(len(label_list)):
  for z in range(len(label_list)):
    model.add_transition(states[q], states[z], trans_mat[q][z])

model.bake()
model.fit(X_train, labels = y_train)
model.predict(X_test)
KevinHHHHH0325 commented 1 year ago

@jmschrei Hi Jacob, sorry for keep bothering. Just want to make sure, can you see this post?

jmschrei commented 1 year ago

Thank you for opening an issue. pomegranate has recently been rewritten from the ground up to use PyTorch instead of Cython (v1.0.0), and so all issues are being closed as they are likely out of date. Please re-open or start a new issue if a related issue is still present in the new codebase.