pymc-devs / pymc

Bayesian Modeling and Probabilistic Programming in Python
https://docs.pymc.io/
Other
8.62k stars 1.99k forks source link

PolyaGamma sampling behaviour changed in 1.3.6->1.3.7 #7417

Open ferrine opened 1 month ago

ferrine commented 1 month ago
          > I have not seen these transforms are even used by PolyaGamma

Can you try pinning polyagamma to 1.3.6 just to see if the tests fail. I just released 1.3.7 which includes numpy 2.0 support. I suspect this change might be the root cause.

Looking at this output:

E           AssertionError: 
E           Arrays are not almost equal to 6 decimals
E           
E           Mismatched elements: 14 / 15 (93.3%)
E           Max absolute difference: 0.25700269
E           Max relative difference: 7.27861386
E            x: array([0.175503, 0.304112, 0.32734 , 0.285465, 0.15268 , 0.208075,
E                  0.324094, 0.261004, 0.256576, 0.173434, 0.115312, 0.034482,
E                  0.121655, 0.047109, 0.358831])
E            y: array([0.358831, 0.047109, 0.121655, 0.034482, 0.115312, 0.173434,
E                  0.256576, 0.261004, 0.324094, 0.208075, 0.15268 , 0.285465,
E                  0.32734 , 0.304112, 0.175503])

It appears that the linked PR leads to values being generated in a reverse order than in v1.3.6....or at at least that's my immediate guess since i'm not quite familiar with the code that is being tested in the CI.

Originally posted by @zoj613 in https://github.com/pymc-devs/pymc/issues/7415#issuecomment-2237491414

Possible resolutions:

zoj613 commented 1 month ago

This example reproduces the behaviour

from polyagamma import random_polyagamma
import numpy as np
rng = np.random.default_rng(12345)

v1.3.6 gives

random_polyagamma(h=[1, 2], size=(3, 2, 2), random_state=rng)
array([[[0.0801582 , 0.07980715],
        [0.3758403 , 0.21884963]],

       [[0.26203975, 0.74751838],
        [0.49877778, 0.69302297]],

       [[0.09164796, 0.21117724],
        [0.10288331, 0.34533088]]])

v1.3.7 gives

random_polyagamma(h=[1, 2], size=(3, 2, 2), random_state=rng)
array([[[0.27257282, 0.17564137],
        [0.06377578, 0.23904942]],

       [[0.09800092, 1.09379983],
        [0.64062812, 0.36893001]],

       [[0.07351574, 0.52117419],
        [0.03288512, 0.12708023]]])

Note that this change in behavior is only present if one uses non-scalar values for the h and z parameters of the distribution. For scalar values the sampling behaviour is identical in both versions.

Do we consider this a breaking change? If so would bumping polyagamma to 2.0.0 and pulling 1.3.7 form pypi be the best solution?

ferrine commented 1 month ago

It seems like it is, @ricardoV94 should have a better opinion on how to jointly handle this

zoj613 commented 1 month ago

I went ahead and re-released v1.3.7 as v2.0.0, which should fix this issue.

ricardoV94 commented 1 month ago

@zoj613 I think it's fine for rngs to come out in a different order. Were we hardcoding the expected results in the PyMC test? If so we can just update for the new order.

More importantly is that there is no mismatch between broadcasted parameters of the distribution, but I assume you have tests for that.

zoj613 commented 1 month ago

@zoj613 I think it's fine for rngs to come out in a different order. Were we hardcoding the expected results in the PyMC test? If so we can just update for the new order.

More importantly is that there is no mismatch between broadcasted parameters of the distribution, but I assume you have tests for that.

The behavior of the rng itself is still identical, the difference is how numpy's Iterator API orders the sampled values since it handles things like broadcasting automatically...versus the manually broadcasting I did before v1.3.7.

I think to it doesn't hurt to bump it to v2.0.0 to avoid any confusion.

zoj613 commented 1 month ago

@zoj613 I think it's fine for rngs to come out in a different order. Were we hardcoding the expected results in the PyMC test? If so we can just update for the new order. More importantly is that there is no mismatch between broadcasted parameters of the distribution, but I assume you have tests for that.

The behavior of the rng itself is still identical, the difference is how numpy's Iterator API orders the sampled values since it handles things like broadcasting automatically...versus the manual broadcasting I did before v1.3.7.

I think to it doesn't hurt to bump it to v2.0.0 to avoid any confusion.