from_samples equivalent code after v1.0

KunFang93 commented 5 months ago

Hi,

Thank you so much for providing this awesome package! I used it before v1.0 and works perfectly. I recently plan to update my previous code with v1.0.4. I tried to go through all documents but still got lost how to update my previous code with new pomegranate

hmm,history = pg.HiddenMarkovModel.from_samples(
    pg.MultivariateGaussianDistribution,
    n_components=self.num_states,
    X=self.obs_seq,
    algorithm='baum-welch',
    return_history=True,
    max_iterations= self.max_iter,
    n_jobs = self.n_jobs,
    verbose=True
)

I wondered if you could shed some lights on this. If more information needed, please let me know. Thanks in advance!

Best, Kun

jmschrei commented 5 months ago

Hi Kun

I'd recommend using DenseHMM and the fit function. You'll need to pass in the number of states, the maximum number of iterations, and the maximum verbosity, into the initialization of the model. You'll also need to use Normal instead of MultivariateGaussianDistribution. By default, Nomal uses a full covariance matrix. Unfortunately, I don't think that the return history functionality made it in the port.

KunFang93 commented 5 months ago

Hi Jacob,

Thanks for your reply! I wondered if this looks good for you

model = DenseHMM([Normal()]*12, init='random', max_iter=1000, tol=0.1) # num of state is 12
model.fit(data) 
hs = model.predict(data)

I also have questions for the dimension of the data. Assume I have 10000 observations with each have dimension 11. I would like to get output with 10000 hidden state label. Should I reshape it? As I tried the following simple sample but failed

model3 = DenseHMM([Normal(), Normal(), Normal()], verbose=True)
X3 = torch.randn(10000, 11)
model3.fit(X3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/kfang/.conda/envs/mamba/envs/pomegranate/lib/python3.12/site-packages/pomegranate/hmm/_base.py", line 581, in fit
    X, sample_weight, priors = partition_sequences(X,
                               ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kfang/.conda/envs/mamba/envs/pomegranate/lib/python3.12/site-packages/pomegranate/_utils.py", line 462, in partition_sequences
    x = _check_parameter(x, "X", ndim=2)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kfang/.conda/envs/mamba/envs/pomegranate/lib/python3.12/site-packages/pomegranate/_utils.py", line 231, in _check_parameter
    raise ValueError("Parameter {} must have {} dims".format(
ValueError: Parameter X must have 2 dims

but if I reshape it into (1, 10000,11) I encounter this error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/kfang/.conda/envs/mamba/envs/pomegranate/lib/python3.12/site-packages/pomegranate/hmm/_base.py", line 604, in fit
    logp += self.summarize(X_, sample_weight=w_, priors=p_).sum()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kfang/.conda/envs/mamba/envs/pomegranate/lib/python3.12/site-packages/pomegranate/hmm/dense_hmm.py", line 543, in summarize
    X, emissions, sample_weight = super().summarize(X,
                                  ^^^^^^^^^^^^^^^^^^^^
  File "/home/kfang/.conda/envs/mamba/envs/pomegranate/lib/python3.12/site-packages/pomegranate/hmm/_base.py", line 679, in summarize
    X = _check_parameter(_cast_as_tensor(X), "X", ndim=3,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kfang/.conda/envs/mamba/envs/pomegranate/lib/python3.12/site-packages/pomegranate/_utils.py", line 250, in _check_parameter
    raise ValueError("Parameter {} must have shape {}".format(
ValueError: Parameter X must have shape (-1, -1, 2)

if I reshape it into (10000, 11, 1) it works but the predicted result has shape (10000, 11).

Thanks for your help again!

Best, Kun

jmschrei commented 5 months ago

Your code looks good to me, in general, but I haven't run it to make run there aren't any syntax errors.

As for the shape question, please read the documentation: https://pomegranate.readthedocs.io/en/latest/tutorials/B_Model_Tutorial_4_Hidden_Markov_Models.html

"Also, make sure that your input sequence is 3D with the three dimensions corresponding to (batch_size, sequence length, dimensionality). Here, batch_size and dimensionality are both 1. The inclusion of batch size helps significantly when processing several sequences in parallel."

KunFang93 commented 5 months ago

Thanks for your reply! Got it, I was too careless and missed this information...

jmschrei / pomegranate

from_samples equivalent code after v1.0 #1105