mne-tools / mne-python

MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python
https://mne.tools
BSD 3-Clause "New" or "Revised" License
2.66k stars 1.31k forks source link

API: RandomState deprecation #9233

Open larsoner opened 3 years ago

larsoner commented 3 years ago

np.random.RandomState is not a good choice for random number generation going forward:

RandomState is effectively frozen and will only receive updates that are required by changes in the the internals of Numpy. More substantial changes, including algorithmic improvements, are reserved for Generator.

I don't understand all the details, but the TL;DR I've gathered from monitoring some SciPy discussions is that it's not as good as the new default_rng / Generator-based way of making random numbers and it should be replaced in people's code.

In the long term I think we want to switch random_state=0 to use default_rng(random_state) instead of RandomState(random_state) internally. I propose:

  1. Have a mne.set_rng_backend(backend) that can be 'RandomState' or 'Generator', maybe with a use_rng_backend context manager. The context manager is maybe overkill, but I do think it might help people transition code from old to new, and it's only a few lines for us.
  2. If this has not been called by the user yet, if they do random_state=<int> they will see a DeprecationWarning stating that the default is 'RandomState' in 0.23 but will change to 'Generator' in 0.24 (or maybe this is a good case for making it 0.25 to give people more time to change over), and we will call set_rng_backend('RandomState').
  3. In 0.24 (or 0.25) we no longer emit this warning and just use 'Generator'.

I think this is a workable solution for modernizing our random_state params.

drammock commented 3 years ago

The new Generator class docstring says:

No Compatibility Guarantee

Generator does not provide a version compatibility guarantee. In particular, as better algorithms evolve the bit stream may change.

I think it's worth asking what we use RNGs for.

larsoner commented 3 years ago

I think for us we want everything to always be reproducible for a given int random_state (and a given mne.set_rng_backend). I hadn't looked at Generator / default_rng, it is indeed a problem that it does not guarantee stability.

I wonder if we should just switch EDIT: via deprecation as above to PCG64, which is guaranteed to be stable:

Compatibility Guarantee

PCG64 makes a guarantee that a fixed seed and will always produce the same random integer stream.

But is better than RandomState:

By default, Generator uses bits provided by PCG64 which has better statistical properties than the legacy MT19937 used in RandomState.

drammock commented 3 years ago

agree on the plan to switch (with deprecation cycle) to explicitly use PCG64 (which happens be be the current default for np.random.default_rng, but that might change).

agramfort commented 3 years ago

scipy will deprecate RandomState ?

scikit-learn has not switched yet so I don't feel the urgence

larsoner commented 3 years ago

scipy will deprecate RandomState ?

There is a long ongoing discussion in the SciPy PR DOC: remove references to RandomState (#13778). The short version so far to me is that RandomState will continue to work at least for some time but should not (and will not) be the preferred way to do things in examples and parameters.

rkern commented 3 years ago

Hi everyone! I'm the cause of everyone's problems here. As you work through these issues, I do recommend reading NEP 19 which goes into more detail about why we had the old backwards-compatibility policy and why we moved away from it. The concerns you have brought up here were definitely in our thoughts. I'm happy to answer any questions that the NEP doesn't answer.

I can't speak definitively as to when/how/if scipy really deprecates RandomState per se, but I have my guess as to where we're going on the aforementioned PR. Some time after we drop support for numpy 1.16, I expect that we will transition the code to use Generator instead of RandomState. I expect that we'll still accept RandomState instances, but we will wrap its core uniform PRNG in Generator and call its methods instead. So if you call scipy.stats.norm.rvs(random_state=RandomState(1234)), you will get the results from Generator.standard_normal()'s improved algorithm rather than the older, slower RandomState.standard_normal(). The core Mersenne Twister PRNG underneath the passed RandomState will still be used, however. Again, we haven't made any decisions, yet, but I think that's where we are going.

larsoner commented 3 years ago

Thanks for the info @rkern . After some reading and discussion I think our plan is to shelve this for now until we bump our minimum numpy requirement to 1.17+ and then switch everything over to using Generator syntax internally (probably still using MT19937 for backward compat), which will likely happen in the next year.

rkern commented 3 years ago

Sounds like a good plan.