jmschrei / pomegranate

Fast, flexible and easy to use probabilistic modelling in Python.
http://pomegranate.readthedocs.org/en/latest/
MIT License
3.29k stars 590 forks source link

[BUG] HMM edges matrix initialization contains NaNs #1078

Open asn32 opened 5 months ago

asn32 commented 5 months ago

Describe the bug Hi, I have been testing out the DenseHMM implementation in Pomegranate v1.0.3 for models with > 50 states, and have occasionally encountered a bug where initializing the model.edges matrix using the model.add_edge results in an edges matrix containing NaNs. The NaNs then propagate through downstream calculations including model.forward_backward and model.predict.

I tracked this down to the following snippet located here:

    if self.edges is None:
        self.edges = torch.empty((n, n), dtype=self.dtype, 
            device=self.device) - inf

where torch.empty sometimes returns an array with NaNs, and NaN - float("inf") = NaN. I think additionally, because in my testing I don't set every entry of the edges matrix manually to a specific probability, subsequent usage of model.edges propagates those NaNs.

To Reproduce Since the bug (I think) comes from the initialization of a large array using torch.empty, the simplest way I have been able to reproduce it is using the above snippet where n is large (> 50), and then not fill in every edge.

The quickest fix I have found is to just pre-set the matrix with torch.log(torch.zeroes((n,n)).

I'm a big fan of the package, and thank you for all the effort you've put in developing it. Just wanted to put this on your radar.