PolyaGamma sampling behaviour changed in 1.3.6->1.3.7

ferrine commented 1 month ago

          > I have not seen these transforms are even used by PolyaGamma

Can you try pinning polyagamma to 1.3.6 just to see if the tests fail. I just released 1.3.7 which includes numpy 2.0 support. I suspect this change might be the root cause.

Looking at this output:

E           AssertionError: 
E           Arrays are not almost equal to 6 decimals
E           
E           Mismatched elements: 14 / 15 (93.3%)
E           Max absolute difference: 0.25700269
E           Max relative difference: 7.27861386
E            x: array([0.175503, 0.304112, 0.32734 , 0.285465, 0.15268 , 0.208075,
E                  0.324094, 0.261004, 0.256576, 0.173434, 0.115312, 0.034482,
E                  0.121655, 0.047109, 0.358831])
E            y: array([0.358831, 0.047109, 0.121655, 0.034482, 0.115312, 0.173434,
E                  0.256576, 0.261004, 0.324094, 0.208075, 0.15268 , 0.285465,
E                  0.32734 , 0.304112, 0.175503])

It appears that the linked PR leads to values being generated in a reverse order than in v1.3.6....or at at least that's my immediate guess since i'm not quite familiar with the code that is being tested in the CI.

Originally posted by @zoj613 in https://github.com/pymc-devs/pymc/issues/7415#issuecomment-2237491414

Possible resolutions:

Pin PolyaGamma to a fixed version for now
Update Tests to respect the updated version
Test that either order is correct???

zoj613 commented 1 month ago

This example reproduces the behaviour

from polyagamma import random_polyagamma
import numpy as np
rng = np.random.default_rng(12345)

v1.3.6 gives

random_polyagamma(h=[1, 2], size=(3, 2, 2), random_state=rng)
array([[[0.0801582 , 0.07980715],
        [0.3758403 , 0.21884963]],

       [[0.26203975, 0.74751838],
        [0.49877778, 0.69302297]],

       [[0.09164796, 0.21117724],
        [0.10288331, 0.34533088]]])

v1.3.7 gives

random_polyagamma(h=[1, 2], size=(3, 2, 2), random_state=rng)
array([[[0.27257282, 0.17564137],
        [0.06377578, 0.23904942]],

       [[0.09800092, 1.09379983],
        [0.64062812, 0.36893001]],

       [[0.07351574, 0.52117419],
        [0.03288512, 0.12708023]]])

Note that this change in behavior is only present if one uses non-scalar values for the h and z parameters of the distribution. For scalar values the sampling behaviour is identical in both versions.

Do we consider this a breaking change? If so would bumping polyagamma to 2.0.0 and pulling 1.3.7 form pypi be the best solution?

ferrine commented 1 month ago

It seems like it is, @ricardoV94 should have a better opinion on how to jointly handle this

zoj613 commented 1 month ago

I went ahead and re-released v1.3.7 as v2.0.0, which should fix this issue.

ricardoV94 commented 1 month ago

@zoj613 I think it's fine for rngs to come out in a different order. Were we hardcoding the expected results in the PyMC test? If so we can just update for the new order.

More importantly is that there is no mismatch between broadcasted parameters of the distribution, but I assume you have tests for that.

zoj613 commented 1 month ago

@zoj613 I think it's fine for rngs to come out in a different order. Were we hardcoding the expected results in the PyMC test? If so we can just update for the new order.

More importantly is that there is no mismatch between broadcasted parameters of the distribution, but I assume you have tests for that.

The behavior of the rng itself is still identical, the difference is how numpy's Iterator API orders the sampled values since it handles things like broadcasting automatically...versus the manually broadcasting I did before v1.3.7.

I think to it doesn't hurt to bump it to v2.0.0 to avoid any confusion.

zoj613 commented 1 month ago

@zoj613 I think it's fine for rngs to come out in a different order. Were we hardcoding the expected results in the PyMC test? If so we can just update for the new order. More importantly is that there is no mismatch between broadcasted parameters of the distribution, but I assume you have tests for that.

The behavior of the rng itself is still identical, the difference is how numpy's Iterator API orders the sampled values since it handles things like broadcasting automatically...versus the manual broadcasting I did before v1.3.7.

I think to it doesn't hurt to bump it to v2.0.0 to avoid any confusion.

pymc-devs / pymc

PolyaGamma sampling behaviour changed in 1.3.6->1.3.7 #7417