pymc-devs / pymc

Bayesian Modeling and Probabilistic Programming in Python
https://docs.pymc.io/
Other
8.64k stars 1.99k forks source link

Discuss: PR to fix multinomial precision issues #2469

Closed ericmjl closed 7 years ago

ericmjl commented 7 years ago

Hi team,

I've been experiencing a multinomial sampling problem, first described here, and then surfacing again [here]. The issue that I get is that the pvals sum to greater than 1 with float32 precision.

As I've dug around, I found that the issue is a floating point precision issue in numpy's multinomial. Internally, it casts everything to float64. Issue is discussed on this issue on the numpy repository.

I'm thinking of submitting a very small PR that builds upon @junpenglao's previous PR on distributions/multinomial.py.

def _random(self, n, p, size=None):
    # Set float type to float64 for numpy.
    p = p.astype('float64')
    # Now, re-normalize all of the values in float64 precision.
    p = p / (p.sum(axis=1, keepdims=True))
    if size == p.shape:
       ...

I have done one test using the same notebook in which I first discovered the problem, and now they go away. From an empirical standpoint, the performance of the multinomial classification model is identical to when I had the + 1E6 hack previously described.

I wanted to pitch this first here to see if there's something I'm missing, before taking the time to put in the PR - or should I put in the PR first and solicit code review?

junpenglao commented 7 years ago

Thanks for the digging @ericmjl! I would say you can just go ahead and submit a PR, we can also see if any test failed on Travis.

ColCarroll commented 7 years ago

I've been trying to implement multinomial naive bayes, and keep running into trouble with Multinomial as well -- would love to help out/see if your fix also fixes my woes!

kyleabeauchamp commented 7 years ago

Seems reasonable if the tests pass.

ericmjl commented 7 years ago

Yay, all tests are passing!!!