jmschrei / pomegranate

Fast, flexible and easy to use probabilistic modelling in Python.
http://pomegranate.readthedocs.org/en/latest/
MIT License
3.29k stars 590 forks source link

Markov Chain - Index out of bounds error #1083

Open Koenig128 opened 3 months ago

Koenig128 commented 3 months ago

Hi, I was trying to fit a Markov Chain to my data and got an error. When I was searching the issues, I found that someone else had already reported this error. It would be great if you could help me on this!

Thank you very much in advance!

          I experimented with changing the data, however the issue is also reproducible with random small data. 
import numpy as np
from pomegranate.markov_chain import MarkovChain

np.random.seed(137)
seq_data = np.random.randint(0, 10, (1,10,1))

model = MarkovChain(k = 1)
model.fit(seq_data) 

throws

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[99], line 5
      2 seq_data = np.random.randint(0, 10, (1,6,1))
      4 model = MarkovChain(k = 1)
----> 5 model.fit(seq_data)

File /opt/conda/lib/python3.10/site-packages/pomegranate/markov_chain.py:216, in MarkovChain.fit(self, X, sample_weight)
    193 def fit(self, X, sample_weight=None):
    194     """Fit the model to optionally weighted examples.
    195 
    196     This method will fit the provided distributions given the data and
   (...)
    213     self
    214     """
--> 216     self.summarize(X, sample_weight=sample_weight)
    217     self.from_summaries()
    218     return self

File /opt/conda/lib/python3.10/site-packages/pomegranate/markov_chain.py:276, in MarkovChain.summarize(self, X, sample_weight)
    274 for i in range(X.shape[1] - self.k):
    275     j = i + self.k + 1
--> 276     distribution.summarize(X[:, i:j], sample_weight=sample_weight)

File /opt/conda/lib/python3.10/site-packages/pomegranate/distributions/conditional_categorical.py:168, in ConditionalCategorical.summarize(self, X, sample_weight)
    165 strides = torch.tensor(self._xw_sum[j].stride(), device=X.device)
    166 X_ = torch.sum(X[:, :, j] * strides, dim=-1)
--> 168 self._xw_sum[j].view(-1).scatter_add_(0, X_, sample_weight[:,j])
    169 self._w_sum[j][:] = self._xw_sum[j].sum(dim=-1)

RuntimeError: index 42 is out of bounds for dimension 0 with size 28

Originally posted by @salpers in https://github.com/jmschrei/pomegranate/issues/1077#issuecomment-1905698817

jmschrei commented 3 months ago

Hi @Koenig128. Sorry for the delay on this. It turns out that there are a series of small bugs that sometimes mask each other. I am working my way through the code resolving these. A challenge I'm encountering is that I remember finishing the implementation of ConditionalCategorical at an airport just before boarding and thinking "thank god I never have to think about that again" and helpfully leaving myself only the docstring """Still under development.""" for the class.

jmschrei commented 3 months ago

This should be fixed in v1.0.4 and I've added in a unit test with this as an example. Please let me know if you encounter any other issues.