Closed cbrnr closed 8 years ago
We could add a seed
option to every function/method that uses random numbers.
I don't like messing with the numpy global seed inside our functions. It's probably better to use an explicit instance of random state that can be passed to functions instead of a seed. (Default is to use global state)
scikit-learn functions have a random_state
argument: http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.KFold.html#sklearn.cross_validation.KFold
def check_random_state(seed):
"""Turn seed into a np.random.RandomState instance
If seed is None, return the RandomState singleton used by np.random.
If seed is an int, return a new RandomState instance seeded with seed.
If seed is already a RandomState instance, return it.
Otherwise raise ValueError.
"""
if seed is None or seed is np.random:
return np.random.mtrand._rand
if isinstance(seed, (numbers.Integral, np.integer)):
return np.random.RandomState(seed)
if isinstance(seed, np.random.RandomState):
return seed
raise ValueError('%r cannot be used to seed a numpy.random.RandomState'
' instance' % seed)
That was exactly what I had in mind.
The Workspace
can have its own RandomState
that is passed to all functions.
All other functions/classes take either a RandomState
argument, a seed for constructing a new RandomState
, or they default to using numpy's global state.
This is even perfectly backwardscompatible.
But do you think the Workspace
can set the random state, i.e. ws.seed(123)
? This would then set the Numpy seed.
I think it's bad practice to implicitly change the behavior of a completely different package.
Consider this:
ws.seed(123) # calls np.random.seed(123)
x = np.random.randn() # this is affected by the seed
If we do a ws.seed(123)
everything outside the workspace should be unaffected.
OK. But this means we can still have a ws.seed(123)
, which sets the seed within the workspace (which is passed on to all methods and functions).
Yes. Well, not the seed, but the RandomState object.
OK, got it! Now which functions need a random_state
argument?
I think only these: Whiteness test, connectivity statistics, VAR simulation.
I think all ICA functions too.
True. We have to check our modification of Infomax. Maybe I edited out similar functionality.
In #89, we added a random seed to our examples. I think it would be nice if we had an option to set the seed globally, e.g.
ws.set_seed
for the OO API.