jmschrei / pomegranate

Fast, flexible and easy to use probabilistic modelling in Python.
http://pomegranate.readthedocs.org/en/latest/
MIT License
3.29k stars 590 forks source link

[BUG] Initializing a normal distribution without data leaves it in an invalid state #1024

Open AKuederle opened 1 year ago

AKuederle commented 1 year ago

Describe the bug When creating a new Normal distribution an initializing it, its cov is just a zero matrix. In _reset_cache, this is an unhandled case, as the sum of the cov is 0, and the if case below is missed. In result, most of the buffers are not initialized and calling methods that need them fail.

https://github.com/jmschrei/pomegranate/blob/c77f967a2b66505b42a4fc4063fcf1d26406a9a5/pomegranate/distributions/normal.py#L143

I ran into this issue when trying to automatically initialize a HMM with GMM states, which failed because of that.

To Reproduce

normal = Normal()
normal._initialize(2)
X = np.random.random((10, 2)).astype(np.float32)
normal.log_probability(X)

The last line throws an error, as the _inv_cov buffer does not exist.

AKuederle commented 1 year ago

I guess in general, the case of an empty (Normal) distribution should be handled gracefully, in the sense that it should not throw an error, but rather return probabilities of zero

jmschrei commented 1 year ago

I don't think this is a problem. If you initialize a distribution, you're basically saying that you've allocated memory for all the parameters but do not have values for those parameters. I would not want someone to mistakenly fail to fit their distribution and get erroneous probabilities. That being said, perhaps it should raise a more informative error than the one it does, specifically outlining this.

jmschrei commented 1 year ago

Can you post your code? You should be able to use literally the same distribution objects in both the GMM and the HMM.

AKuederle commented 1 year ago

Sorry for not providing an example for this! I will try to isolate the issue that I found.