Open AKuederle opened 1 year ago
I'm able to reproduce this with the following script:
import torch
from pomegranate.distributions import Normal
from pomegranate.hmm import DenseHMM
X = [torch.randn(i // 5, i, 3) for i in range(20, 100)]
d = [Normal(), Normal()]
model = DenseHMM(d, verbose=True)
model.fit(X)
Looks like I only tested when the first dimension (batch size) is the same across all batches. I'll add in a fix today but I'm waiting for a bit more user feedback before releasing the first patch.
Here's a workaround: just reshape the data yourself and call _initialize
. All it's doing is running k-means on a 2D matrix so you can break sequence boundaries without concern.
import torch
from pomegranate.distributions import Normal
from pomegranate.hmm import DenseHMM
X = [torch.randn(i // 5, i, 3) for i in range(20, 100)]
X_ = torch.cat([x.reshape(-1, 3) for x in X], dim=0).unsqueeze(0) # Add this
d = [Normal(), Normal()]
model = DenseHMM(d, verbose=True)
model._initialize(X_) # Add this too
model.fit(X)
Would you mind providing simple reproducing scripts in the future? That would help me debug.
Thanks for looking into this!
Did that code work for you?
Hi, I am experiencing the same problem. For example in the example given in the documentation, how can I add a second sequence to X and then call model.predict(X)? I want the model to learn all the parameters based on all the observed sequences.
sequence = 'CGACTACTGACTACTCGCCGACGCGACTGCCGTCTATACTGCGCATACGGC' X = numpy.array([[[['A', 'C', 'G', 'T'].index(char)] for char in sequence]]) X.shape
Like adding the sequence below to X
sequence1 = 'CGACTACTGACTACTCGCCGACGCGACTGCC'
If you want to process two sequences of different lengths, you'll need to run predict
twice, each on a tensor with a batch size of 1 and differing sequence lengths. Each method can only run on a tensor of a fixed size. fit
can operate on tensors of different sizes only because I added a convenient utility inside.
Describe the bug If the model is not initialized, the fit method initialize it, before running the actual fit. This works, if only sequences with the same length are passed. However, if sequences of unequal length are provided (as supported by the fit method), the following line fails, as sequences with different length can not be concatenated.
https://github.com/jmschrei/pomegranate/blob/c77f967a2b66505b42a4fc4063fcf1d26406a9a5/pomegranate/hmm/_base.py#L587
Not sure what the correct solution is here...