marrink-lab / vermouth-martinize

Describe and apply transformation on molecular structures and topologies
Apache License 2.0
86 stars 38 forks source link

test_redistributed_kdtree is flaking #437

Closed pckroon closed 2 years ago

pckroon commented 2 years ago

The test for the redistributed KDTree (https://github.com/marrink-lab/vermouth-martinize/blob/master/vermouth/tests/test_redistributed_kdtree.py#L36) is flaking (i.e. Hypothesis sometimes finds issues, sometimes doesn't). Commit ad084fb72413ed459de1d76fcc0f6e5b973c0446 merged in #427 was supposed to resolve this, but didn't. I think part of the cause is that hypothesis simply became better at finding issues.

For example:

Falsifying example: test_sparse_distance_matrix(
    coordinates=array([[0.00000000e+00, 6.10351562e-05],
           [5.00000000e-01, 0.00000000e+00],
           [0.00000000e+00, 0.00000000e+00]]), max_dist=0.5, p=4,
)
E       assert {(0, 0): 0.0,\n (0, 1): 0.5,\n (0, 2): 6.103515625e-05,\n (1, 0): 0.5,\n (1, 1): 0.0,\n (1, 2): 0.5,\n (2, 0): 6.103515625e-05,\n (2, 1): 0.5,\n (2, 2): 0.0} == approx({(0, 0): 0.0 ± 1.0e-12, (0, 1): 0.0 ± 1.0e-12, (0, 2): 6.103515625e-05 ± 6.1e-11, (1, 0): 0.0 ± 1.0e-12, (1, 1): 0.0 ± 1.0e-12, (1, 2): 0.5 ± 5.0e-07, (2, 0): 6.103515625e-05 ± 6.1e-11, (2, 1): 0.5 ± 5.0e-07, (2, 2): 0.0 ± 1.0e-12})
E         comparison failed. Mismatched elements: 2 / 9:
E         Max absolute difference: 0.5
E         Max relative difference: inf
E         Index  | Obtained | Expected     
E         (0, 1) | 0.5      | 0.0 ± 1.0e-12
E         (1, 0) | 0.5      | 0.0 ± 1.0e-12

Given the array above, the distance between points 0 and 1 will be 0.5 + 6.10351562e-05, which is more than the max_dist=0.5. Therefor, the distance between 0 and 1 should not be included (and be 0). However, the redistributed KDTree does find this distance <= max_dist, and includes it anyway. Switching a <= in redistributed.kdtree.sparse_distance_matrix to < will not work either, hypothesis will simply find an example that should be included but isn't.

I'm not really in the mood to debug this bit of code. I suggest making scipy a hard dependency, and removing the redistributed KDTree altogether. I would suggest waiting for scipy 1.9.0 to see which wheels it distributes, to make sure there's nothing important missing. Until then I suggest simply rerunning tests where this specific test is the only cause for failure. @fgrunewald opinion?

fgrunewald commented 2 years ago

@pckroon I never really understood why script wasn't a hard dependency. I never had problems with that in polyply so far except for the occasional API change but that is easily detected and resolved. And less code is less liability. So as far as I'm concerned let's go ahead.

pckroon commented 2 years ago

The main reasoning was that for a while (before scipy 1.0 maybe?) they didn't distribute wheels on pypi (maybe just not for windows), which makes installing much harder. Scipy 1.9.0rc1 is already released, and its set of wheels looks pretty complete [1]. So let's just do it. I'll open a PR.

[1] https://pypi.org/project/scipy/1.9.0rc1/#files