BUG: test case `test_replace_vars_in_graphs_nested_reference` is flaky

aerubanov commented 5 months ago

Describe the issue:

test_replace_vars_in_graphs_nested_reference in tests/test_pytensorf.py fails sometimes (1 of 50 approx) both locally and in CI github action (like here https://github.com/pymc-devs/pymc/actions/runs/8005225256/job/21864183620)

Reproduceable code example:

@pytest.mark.parametrize("exec_number", range(100))
def test_replace_vars_in_graphs_nested_reference(exec_number):
    # Replace both `x` and `y`, where the replacement of y references `x`
    x = pm.HalfNormal.dist(1e-3, name="x")
    neg_x = -x
    y = pm.Uniform.dist(neg_x, x, name="y")
    x_value = x.clone()
    y_value = y.clone()
    replacements = {x: x_value, y: neg_x + y_value}
    [new_x, new_y] = replace_vars_in_graphs([x, y], replacements=replacements)
    assert new_x.eval({x_value: 100}) == 100
    assert new_y.eval({x_value: 100, y_value: 1}) == -99
    assert new_y.eval({neg_x: 100, y_value: 1}) == 101
    assert np.abs(x.eval()) < 1
    # Confirm the original `y` variable is changed in place
    # This is unavoidable if we want to respect the identity of the replacement variables
    # As when imputing `neg_x` and `x` while evaluating `new_y` above and below.
    assert np.abs(y.eval({x_value: 100})) > 1

Error message:

====================================================================== short test summary info =======================================================================
FAILED tests/test_pytensorf.py::test_replace_vars_in_graphs_nested_reference[30] - AssertionError: assert 0.8356647400415085 > 1
FAILED tests/test_pytensorf.py::test_replace_vars_in_graphs_nested_reference[65] - AssertionError: assert 0.35686940584363924 > 1
============================================================== 2 failed, 98 passed, 6 warnings in 5.28s ==============================================================

>       assert np.abs(y.eval({x_value: 100})) > 1
E       AssertionError: assert 0.35686940584363924 > 1
E        +  where 0.35686940584363924 = <ufunc 'absolute'>(array(-0.35686941))
E        +    where <ufunc 'absolute'> = np.abs
E        +    and   array(-0.35686941) = <bound method Variable.eval of y>({x: 100})
E        +      where <bound method Variable.eval of y> = y.eval

PyMC version information:

pymc: main brunch on 74748c71ff0 pytensor: 2.18.6 os: linux (manjaro)

Context for the issue:

No response

welcome[bot] commented 5 months ago

] :tada: Welcome to PyMC! :tada: We're really excited to have your input into the project! :sparkling_heart:
If you haven't done so already, please make sure you check out our Contributing Guidelines and Code of Conduct.

aerubanov commented 5 months ago

May be connected to https://github.com/pymc-devs/pymc/pull/7114

ricardoV94 commented 5 months ago

@aerubanov can you see if the pytensor.dprint changes across runs/when it fails (other than the RNG variable)

aerubanov commented 5 months ago

@ricardoV94 pytensor.dprint(y) output for passed test

uniform_rv{0, (0, 0), floatX, False}.1 [id A] 'y'
 ├─ RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F7B3A7F8BA0>) [id B]
 ├─ [] [id C]
 ├─ 11 [id D]
 ├─ Neg [id E]
 │  └─ x [id F]
 └─ halfnormal_rv{0, (0, 0), floatX, False}.1 [id G] 'x'
    ├─ RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F7B3A7F8040>) [id H]
    ├─ [] [id I]
    ├─ 11 [id J]
    ├─ 0.0 [id K]
    └─ 0.001 [id L]

and for failed one

uniform_rv{0, (0, 0), floatX, False}.1 [id A] 'y'
 ├─ RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7FC7DE640E40>) [id B]
 ├─ [] [id C]
 ├─ 11 [id D]
 ├─ Neg [id E]
 │  └─ x [id F]
 └─ halfnormal_rv{0, (0, 0), floatX, False}.1 [id G] 'x'
    ├─ RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7FC7DE35F5A0>) [id H]
    ├─ [] [id I]
    ├─ 11 [id J]
    ├─ 0.0 [id K]
    └─ 0.001 [id L]

Looks same for me

ricardoV94 commented 5 months ago

Yeah. Is the failure reasonable under random variation?

ricardoV94 commented 5 months ago

Y should be U(0, x)?. Something is off about the test but I don't remember the idea

ricardoV94 commented 5 months ago

I guess the idea was that by increasing the scale it's unlikely a draw would be in the -1, 1 range, but that's still 1%. We could make it -10_000, 10_000 range instead, and add a comment that it should fail 1/10_000 of the times

pymc-devs / pymc