Do we know why random tests are failing?

neukym commented 3 weeks ago

Do we know why random tests are failing during CI? They fail, and then pass on the second attempt - it's a bit annoying for something that should be reproducible.

caiw commented 3 weeks ago

It's bizarre, isn't it? Here is one example failure:

=========================== short test summary info ============================
[117](https://github.com/kymata-atlas/kymata-core/actions/runs/11416705740/job/31768215218#step:7:118)
FAILED tests/test_expression.py::test_hes_rename_functions_just_one - assert <kymata.entit...x7f36ca0be350> == <kymata.entit...x7f36ca193010>
[118]...

[119]...
  Full diff:
[120]...
  - <kymata.entities.expression.HexelExpressionSet object at 0x7f36ca193010>
[121]...
  ?                                                                  ^^ ^^
[122]...
  + <kymata.entities.expression.HexelExpressionSet object at 0x7f36ca0be350>
[123]...
  ?                                                                  ^^^ ^
[124]...
=========== 1 failed, 146 passed, 3 skipped, 290 warnings in 11.28s ============
[125]...
Error: Process completed with exit code 1.

And the intermittent failures I've inspected are always similar, like a failure in object identity.

However looking at that test test_hes_rename_functions_just_one():

def test_hes_rename_functions_just_one():
    data_left = [np.random.randn(5, 10) for _ in range(2)]
    data_right = [np.random.randn(5, 10) for _ in range(2)]

    es = HexelExpressionSet(
        functions=["first", "second"],
        hexels_lh=range(5),
        hexels_rh=range(5),
        latencies=range(10),
        data_lh=data_left,
        data_rh=data_right,
    )
    target_es = HexelExpressionSet(
        functions=["first_renamed", "second"],
        hexels_lh=range(5),
        hexels_rh=range(5),
        latencies=range(10),
        data_lh=data_left,
        data_rh=data_right,
    )
    assert es != target_es
    es.rename(functions={"first": "first_renamed"})
    assert es == target_es

That should not be failing intermittently.

Here's the only thing I can imagine:

Tests for ExpressionSet equality are done by comparing data blocks (and other things). This will involve float many float comparisons, and the data blocks in this test are randomly generated. Perhaps some random floats fail at equality comparison because some spooky nondeterministic behaviour in the Gitlab CI runner. (I've never made it fail by running locally.)

If this is the case, then using non-random data would at least fix this issue, even if it didn't fully explain it.

caiw commented 3 weeks ago

Or it could be nondeterministic behaviour in the sparse package, I suppose, which converts the numpy.arrays to sparse.COOs in HexelExpressionSet.__init__... I really don't know how we could track that down though unless we can find a specific example of test data which causes the test to fail.

caiw commented 3 weeks ago

Here is another example of a randomly failing test involving float comparisons but not involving sparse...

caiw commented 3 weeks ago

One thing which might work would be to force (e.g.) numpy.float32s in all tests, in case it's some kind of heterogeneous architecture issue with the Github CI runners.

kymata-atlas / kymata-core

Do we know why random tests are failing? #388