tensorflow / skflow

Simplified interface for TensorFlow (mimicking Scikit Learn) for Deep Learning
Apache License 2.0
3.18k stars 439 forks source link

Default random seed is 42 rather than defaulting to a random random seed #92

Closed quispiam closed 8 years ago

quispiam commented 8 years ago

Hi,

I've been working through some of the tutorials and they often use random.seed() at the beginning. I tried playing with this value to see how it effected the output of a DNN and it doesn't change anything.

A little digging found that by default tf_random_seed in dnn.py is 42 and it must be specified when TensorFlowDNNClassifier() in created if you want anything other than 42.

I found this somewhat confusing and, unless there are other reasons for setting a default random seed in dnn.py (as opposed to the user doing this in their code), I would argue that the default behaviour should leave tf_random_seed undefined. eg tf_random_seed=None instead of tf_random_seed=42.

Thanks for making this awesome project and tutorials, i'm finding them really helpful in my exploration of machine learning!

ilblackdragon commented 8 years ago

The reason why it's done this way is due to Tensorflow's random seed actually is not using random.seed. So you need to set tf.random_seed to actually have reproducible behaviour. And you need to call tf.random_seed only when a graph is created (e.g. reason why it needs to be called inside TensorFlowEstimator).

Now, you are right that random.seed(42) in the beginning of examples is not needed anymore because now seed is set in the data feeder. I'll remove the random.seed from example code.

terrytangyuan commented 8 years ago

@ilblackdragon I just removed them from example code so other people won't get confused. :-)