Closed dennisbrookner closed 1 year ago
I agree this sounds like a good idea, and is consistent with how users would expect this sort of consistency to be achieved.
I think this would be a good "first issue" for someone to work on. In the meantime, I will note that setting the random seed for numpy
prior to the call to rs.utils.add_rfree()
should get you a reproducible result:
In [1]: import numpy as np
In [2]: import reciprocalspaceship as rs
In [3]: ds = rs.read_mtz("tests/data/fmodel/9LYZ.mtz")
In [4]: def test_noseed(ds):
...: flags1 = rs.utils.add_rfree(ds, fraction=0.5)["R-free-flags"]
...: flags2 = rs.utils.add_rfree(ds, fraction=0.5)["R-free-flags"]
...: return np.array_equal(flags1, flags2)
...:
In [5]: def test_seed(ds):
...: np.random.seed(100)
...: flags1 = rs.utils.add_rfree(ds, fraction=0.5)["R-free-flags"]
...: np.random.seed(100)
...: flags2 = rs.utils.add_rfree(ds, fraction=0.5)["R-free-flags"]
...: return np.array_equal(flags1, flags2)
...:
In [6]: test_noseed(ds)
Out[6]: False
In [7]: test_seed(ds)
Out[7]: True
I just want to note in case anyone finds this thread in the future that
np.random.seed(seed)
np.random.random()
and
rng = np.random.default_rng(seed)
rng.random()
are actually not equivalent, because they use different random number generation algorithms. I only mention this because if you used the np.random.seed
workaround above, and then switched over to the new rs
function that uses np.random.default_rng
, your flags would change.
I'm quite certain that this will not actually apply to anyone, because rs
will contain the new functionality quite soon (and already does on the rfree
branch), but I figured I would mention it here just in case.
I just had the unfortunate experience of re-running a cell in a jupyter notebook and regenerating all of my R-free flags ðŸ˜. While that is of course totally my fault, I'm also sure I won't be the last to do this.
To me, the solution is to allow a user to seed the random number generator, and then get reproducible R-free flags.
e.g.
would become something like
This has the side benefit of switching us over to the new version of
numpy
random number generators, which I'm just now learning about, but are a definite improvement.