Closed Koenig128 closed 11 months ago
This is a recurring problem. Basically, when you have many more states than you need to model your data, some states do not get assigned many/any points because the rest of the states model the data well. Having too few examples can cause an issue with the Cholesky decomposition if there isn't enough variance among them. I know scikit-learn gets around this using some trick -- I need to get around to implementing that. In the meantime, it's basically a sign that your model has too many parameters. If you had more complex real world data you'd be less likely to see this error because the number of states you chose would be needed to capture the heterogeneity of it.
I just had a look in the sklearn-code out of interest. They basically add a small value (reg_val
) to the terms of the covariance matrix. For example see here: https://github.com/scikit-learn/scikit-learn/blob/093e0cf14aff026cca6097e8c42f83b735d26358/sklearn/mixture/_gaussian_mixture.py#L153
This value is set to 1-6 by default
reg_covar : float, default=1e-6
Non-negative regularization added to the diagonal of covariance.
Allows to assure that the covariance matrices are all positive.
In case the decomposition still fails, the error message suggests to either reduce the number of states, or increase the value of reg_covar
Yeah that's what I thought... I'll try adding that in soon. In the meantime, pomegranate is modular enough that you can just copy/paste the Normal
class, add this in yourself, and drop those objects into native pomegranate objects.
If I understood correctly, it should be possible to subclass Normal and just add a small value to covs
calculated in this method.
class StableNormal(Normal):
def from_summaries(self):
"""Update the model parameters given the extracted statistics.
This method uses calculated statistics from calls to the `summarize`
method to update the distribution parameters. Hyperparameters for the
update are passed in at initialization time.
Note: Internally, a call to `fit` is just a successive call to the
`summarize` method followed by the `from_summaries` method.
"""
if self.frozen == True:
return
means = self._xw_sum / self._w_sum
if self.covariance_type == 'full':
v = self._xw_sum.unsqueeze(0) * self._xw_sum.unsqueeze(1)
covs = self._xxw_sum / self._w_sum - v / self._w_sum ** 2.0
elif self.covariance_type in ['diag', 'sphere']:
covs = self._xxw_sum / self._w_sum - \
self._xw_sum ** 2.0 / self._w_sum ** 2.0
if self.covariance_type == 'sphere':
covs = covs.mean(dim=-1)
# This is the magic :)
covs += 1e-6
_update_parameter(self.means, means, self.inertia)
_update_parameter(self.covs, covs, self.inertia)
self._reset_cache()
Yes, that's fine too. I was thinking you might want to add a user-defined covariance but hardcoding some small number is fine too if it works.
Sorry, but to be clear you probably want to add covs += numpy.eye(covs.shape[0]) * 1e-6
under the full covariance type, right? You just want to add it to the diagonal of the covariance matrix, not every element.
Yes, that's fine too. I was thinking you might want to add a user-defined covariance but hardcoding some small number is fine too if it works.
Yes, that would of course be the more elegant version. On that note, the ditribution has a parameter called min_cov
which seems to be unused. Was this maybe intended for exactly this workaround?
Sorry, but to be clear you probably want to add
covs += numpy.eye(covs.shape[0]) * 1e-6
under the full covariance type, right? You just want to add it to the diagonal of the covariance matrix, not every element.
Ah yes, you are right, I misread the scikit-learn code! Only the diagonal makes way more sense.
Dear Arne and Jacob,
thank you very much for your clarifications and helpful suggestions! I really appreciate your efforts!
Btw. is this a "duplicate" of https://github.com/jmschrei/pomegranate/issues/1039 ?
Not exactly a duplicate. min_cov
doesn't necessarily fix the issue but could. Regardless, min_cov
should be added in too.
Dear Jacob,
I ran into a problem with
densehmm
and I saw that some previous issues dealt with similar problems - but because of your recent rewrite, I am not sure if that was fixed and I am doing something wrong. I would appreciate if you could take a look.A minimal example:
The sinus and cosinus functions are for encoding time information.
I am getting the following error: _LinAlgError: linalg.cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 3 is not positive-definite).
I do not quite understand the problem. It works (sometimes) if I reduce the number of states:
model = DenseHMM([Normal(), Normal(), Normal(), Normal(), Normal(), Normal(), Normal(), Normal(), Normal(), Normal()], max_iter=10, verbose=True) model.fit(tensor_3d)
But I would need more states because my real data are more complex. Do you have any suggestions?
Thank you very much in advance!