jmschrei / pomegranate

Fast, flexible and easy to use probabilistic modelling in Python.
http://pomegranate.readthedocs.org/en/latest/
MIT License
3.35k stars 589 forks source link

HMM - multivariate time series #1067

Closed Koenig128 closed 10 months ago

Koenig128 commented 10 months ago

I want to use from_sample to generate an hmm. I have multivariate time series data - with different observed variables (a, b, c) per timepoint t. According to what I read here, I am passing the data as a list of numpy arrays where the rows are different timepoints and the columns are features/observed variables.

array = numpy.array([[t1a, t1b, t1c], [t2a, t2b, t2c])

model = HiddenMarkovModel.from_samples(MultivariateGaussianDistribution, n_components=20, X=array)

I am using "sample" to generate data from the model.

samples = model.sample(length = 100)

As far as I understood, the sequence generated with this method should be a list of emitted items - but in the example above "samples" contains 100 lists, each with only one element. But I meant to do a multivariate hmm - with three emitted variables per timepoint. Then, why does "samples" not contain lists with each three variables too?

I would really appreaciate any help and feedback on this!

Thank you very much in advance, I read the pomegranate documentation and could not find an answer to that question.

jmschrei commented 10 months ago

Hello

Potentially, you're not finding an answer in the documentation because that API has been deprecated. The new v1.0.0 API uses DenseHMM or SparseHMM and only supports a fit function, to be more in line with scikit-learn and other packages. Please read the README for more and let me knw if you have any other questions.

If you want to keep using that version of pomegranate you need to reshape your data to be 3D with dimensions being the batch size, the length, and the number of features per example, even if you only have one sequence (batch size = 1). 2D inputs are assumed to be univariate. Alternatively, you can pass in a list of 2D arrays if each array has a different length. It looks like you've passed in a single 2D array, unless I'm misinterpreting what the variables in the array are. A requirement of all tensors being 3D is built into the new version of pomegranate to help avoid these issues.

Koenig128 commented 10 months ago

Dear Jacob,

thank you very much for your help! I am using the new API and the fit function now.