jmschrei / pomegranate

Fast, flexible and easy to use probabilistic modelling in Python.
http://pomegranate.readthedocs.org/en/latest/
MIT License
3.29k stars 590 forks source link

Choleski-Decomposition #1068

Closed Koenig128 closed 7 months ago

Koenig128 commented 7 months ago

Dear Jacob,

I ran into a problem with densehmm and I saw that some previous issues dealt with similar problems - but because of your recent rewrite, I am not sure if that was fixed and I am doing something wrong. I would appreciate if you could take a look.

A minimal example:

np.random.seed(42)
data = np.random.rand(20000)
df = pd.DataFrame(data, columns=['value'])
sin_values = np.sin(2 * np.pi * df.index / 24)
cos_values = np.cos(2 * np.pi * df.index / 24)
sin_normalized = (sin_values - sin_values.min()) / (sin_values.max() - sin_values.min())
cos_normalized = (cos_values - cos_values.min()) / (cos_values.max() - cos_values.min())

df['sinday'] = sin_normalized
df['cosday'] = cos_normalized

data_array = df.values
tensor_3d = data_array[np.newaxis, :, :]

model = DenseHMM([Normal(), Normal(), Normal(), Normal(), Normal(),
                 Normal(), Normal(), Normal(), Normal(), Normal()], max_iter=10, verbose=True)
model.fit(tensor_3d)

The sinus and cosinus functions are for encoding time information.

I am getting the following error: _LinAlgError: linalg.cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 3 is not positive-definite).

I do not quite understand the problem. It works (sometimes) if I reduce the number of states:

model = DenseHMM([Normal(), Normal(), Normal(), Normal(), Normal(), Normal(), Normal(), Normal(), Normal(), Normal()], max_iter=10, verbose=True) model.fit(tensor_3d)

But I would need more states because my real data are more complex. Do you have any suggestions?

Thank you very much in advance!

jmschrei commented 7 months ago

This is a recurring problem. Basically, when you have many more states than you need to model your data, some states do not get assigned many/any points because the rest of the states model the data well. Having too few examples can cause an issue with the Cholesky decomposition if there isn't enough variance among them. I know scikit-learn gets around this using some trick -- I need to get around to implementing that. In the meantime, it's basically a sign that your model has too many parameters. If you had more complex real world data you'd be less likely to see this error because the number of states you chose would be needed to capture the heterogeneity of it.

AKuederle commented 7 months ago

I just had a look in the sklearn-code out of interest. They basically add a small value (reg_val) to the terms of the covariance matrix. For example see here: https://github.com/scikit-learn/scikit-learn/blob/093e0cf14aff026cca6097e8c42f83b735d26358/sklearn/mixture/_gaussian_mixture.py#L153

This value is set to 1-6 by default

    reg_covar : float, default=1e-6
        Non-negative regularization added to the diagonal of covariance.
        Allows to assure that the covariance matrices are all positive.

(from https://github.com/scikit-learn/scikit-learn/blob/093e0cf14aff026cca6097e8c42f83b735d26358/sklearn/mixture/_gaussian_mixture.py#L510)

In case the decomposition still fails, the error message suggests to either reduce the number of states, or increase the value of reg_covar

jmschrei commented 7 months ago

Yeah that's what I thought... I'll try adding that in soon. In the meantime, pomegranate is modular enough that you can just copy/paste the Normal class, add this in yourself, and drop those objects into native pomegranate objects.

AKuederle commented 7 months ago

If I understood correctly, it should be possible to subclass Normal and just add a small value to covs calculated in this method.

https://github.com/jmschrei/pomegranate/blob/f8ed453337fae6b44eddcbfe0d1031d33d8bea76/pomegranate/distributions/normal.py#L272


class StableNormal(Normal):

        def from_summaries(self):
        """Update the model parameters given the extracted statistics.

        This method uses calculated statistics from calls to the `summarize`
        method to update the distribution parameters. Hyperparameters for the
        update are passed in at initialization time.

        Note: Internally, a call to `fit` is just a successive call to the
        `summarize` method followed by the `from_summaries` method.
        """

        if self.frozen == True:
            return

        means = self._xw_sum / self._w_sum

        if self.covariance_type == 'full':
            v = self._xw_sum.unsqueeze(0) * self._xw_sum.unsqueeze(1)
            covs = self._xxw_sum / self._w_sum -  v / self._w_sum ** 2.0

        elif self.covariance_type in ['diag', 'sphere']:
            covs = self._xxw_sum / self._w_sum - \
                self._xw_sum ** 2.0 / self._w_sum ** 2.0
            if self.covariance_type == 'sphere':
                covs = covs.mean(dim=-1)

                # This is the magic :)
                covs += 1e-6 

        _update_parameter(self.means, means, self.inertia)
        _update_parameter(self.covs, covs, self.inertia)
        self._reset_cache()
jmschrei commented 7 months ago

Yes, that's fine too. I was thinking you might want to add a user-defined covariance but hardcoding some small number is fine too if it works.

jmschrei commented 7 months ago

Sorry, but to be clear you probably want to add covs += numpy.eye(covs.shape[0]) * 1e-6 under the full covariance type, right? You just want to add it to the diagonal of the covariance matrix, not every element.

AKuederle commented 7 months ago

Yes, that's fine too. I was thinking you might want to add a user-defined covariance but hardcoding some small number is fine too if it works.

Yes, that would of course be the more elegant version. On that note, the ditribution has a parameter called min_cov which seems to be unused. Was this maybe intended for exactly this workaround?

Sorry, but to be clear you probably want to add covs += numpy.eye(covs.shape[0]) * 1e-6 under the full covariance type, right? You just want to add it to the diagonal of the covariance matrix, not every element.

Ah yes, you are right, I misread the scikit-learn code! Only the diagonal makes way more sense.

Koenig128 commented 7 months ago

Dear Arne and Jacob,

thank you very much for your clarifications and helpful suggestions! I really appreciate your efforts!

AKuederle commented 7 months ago

Btw. is this a "duplicate" of https://github.com/jmschrei/pomegranate/issues/1039 ?

jmschrei commented 7 months ago

Not exactly a duplicate. min_cov doesn't necessarily fix the issue but could. Regardless, min_cov should be added in too.