numpy / numpy

The fundamental package for scientific computing with Python.
https://numpy.org
Other
27.97k stars 10.05k forks source link

ENH: RandomState should expose an API as similar as possible to numpy.random #13121

Open clbarnes opened 5 years ago

clbarnes commented 5 years ago

As far as possible, an instance of RandomState should be a drop-in replacement for the numpy.random module, so that they can be passed transparently into a randomising procedure without having to make sure which methods the procedure is calling.

There exist a number of aliases for np.random.random_sample which are not present in RandomState:

random, in particular, is very commonly used, as it's the direct counterpart to python's built-in random.random. A cursory google suggests that np.random.random is mentioned more often than np.random.random_sample in tutorials. The lack of these aliases on RandomState, then, represent an impediment to porting determinism into existing code relying on unseeded numpy.random free functions (in cases where global seeding is not desired), e.g. https://github.com/aestrivex/bctpy/issues/67 .

eric-wieser commented 5 years ago

In some sense I'd consider the status quo a feature, since "There should be one-- and preferably only one --obvious way to do it.".

We're stuck with the old aliases because removing them would break existing code, but i'd prefer not to add new ones.

clbarnes commented 5 years ago

I agree with that, it's just a pain that so much code is using the random alias, in particular. I'm happy for this to be closed if the simplicity is a greater concern than compatibility.

Would it be worth noting in the docs that random_sample is the preferred usage and that others are compatibility aliases? At least we could prevent any new code getting into this situation.

eric-wieser commented 5 years ago

Would your particular problem be solved by us providing a get_random_state method returning the global instance use by the module-level functions?

clbarnes commented 5 years ago

Yes, I think so - the issue is allowing library functions to use their own RandomState instance if given a seed, and the global one if not. If we could use the actual global RandomState instance instead of the free-floating aliases to its methods, then obviously the same interface is guaranteed (it just means refactoring legacy code to not use the random_sample aliases).

I can do this, unless there's anything about numpy likely to trip me up!