Closed zhuoqiang closed 3 years ago
BTW: it would be great if keras could expose an unified API for reproducible training
Right. I will look into it. Or does anybody else want to take a look at it?
I run into this problem in edward, here is fix we went with after rather long discussion: https://github.com/blei-lab/edward/pull/184. Long story short - it is pretty hard to seed tensorflow if you have single shared session. Would be very interested to hear if there is a better solution :)
Any update on this?
Correct me if I'm wrong, but looks like this issue is still open and there is no way currently in Keras with a TensorFlow backend to get reproducible results. Any update? Workaround?
Well, there is this hack https://github.com/blei-lab/edward/pull/184, I can propose PR with that to Keras if that makes sense, @fchollet ?
The solution is to simply add set_seed() function, but raise an error if someone calls it after a TF variable is created. You cannot reseed after some Variable was created, as the previous seed was used to create initializers for it.
Any news on that issue? @bluelight773 I think when running it on the CPU it's reproducible - but that is not really an option most of the time
@fchollet @zhuoqiang Could you confirm this?
Maybe there is a workaround by programming use both Keras and Tensorflow following this post: https://blog.keras.io/keras-as-a-simplified-interface-to-tensorflow-tutorial.html
Use Keras pre-defined model to speed up building your model. But use Tensorflow for input, output and optimization. Take a look at this code, it seems could reproduce the result. I use CentOS 7 server, with Tesla K40. It always shows 0.6268 for the result.
>>> keras.__version__
'1.1.1'
>>> tf.__version__
'0.12.0-rc1'
You should seed it by
import numpy as np
np.random.seed(42)
import tensorflow as tf
tf.set_random_seed(42)
"""
Different behaviors during training and testing
Some Keras layers (e.g. Dropout, BatchNormalization) behave differently at training time and testing time.
You can tell whether a layer uses the "learning phase" (train/test) by printing layer.uses_learning_phase,
a boolean: True if the layer has a different behavior in training mode and test mode, False otherwise.
If your model includes such layers, then you need to specify the value of the learning phase as part of feed_dict,
so that your model knows whether to apply dropout/etc or not.
To make use of the learning phase, simply pass the value "1" (training mode) or "0" (test mode) to feed_dict:
"""
import numpy as np
np.random.seed(42)
import tensorflow as tf
tf.set_random_seed(42)
sess = tf.Session()
from keras.layers import Dropout, Dense, LSTM
from keras import backend as K
K.set_session(sess)
from keras.objectives import categorical_crossentropy
from keras.metrics import categorical_accuracy as accuracy
# load data
from tensorflow.examples.tutorials.mnist import input_data
mnist_data = input_data.read_data_sets('MNIST_data', one_hot=True)
img = tf.placeholder(tf.float32, shape=(None, 784))
labels = tf.placeholder(tf.float32, shape=(None, 10))
x = Dense(128, activation='relu')(img)
x = Dropout(0.5)(x)
x = Dense(128, activation='relu')(x)
x = Dropout(0.5)(x)
preds = Dense(10, activation='softmax')(x)
loss = tf.reduce_mean(categorical_crossentropy(labels, preds))
# train_step = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
train_step = tf.train.RMSPropOptimizer(learning_rate=0.001).minimize(loss)
# train_step = tf.train.AdagradOptimizer(learning_rate=0.001).minimize(loss)
# train_step = tf.train.AdadeltaOptimizer(learning_rate=0.001).minimize(loss)
with sess.as_default():
sess.run(tf.global_variables_initializer())
for i in range(100):
batch = mnist_data.train.next_batch(50)
train_step.run(feed_dict={img: batch[0],
labels: batch[1],
K.learning_phase(): 1})
acc_value = accuracy(labels, preds)
with sess.as_default():
print acc_value.eval(feed_dict={img: mnist_data.test.images,
labels: mnist_data.test.labels,
K.learning_phase(): 0})
I heard that Keras is going to be merged in TensorFlow. Can I expect that the problem of reproducibility is solved at the same time? If YES, it will be great improvement for Kaggle usage!
@nejumi, ditto. This lack of support makes it really hard to run experiments with Keras & TF. I appreciate the convos and solutions here but really hoping this gets fixed soon.
In principle, this should do it:
import numpy as np
np.random.seed(...)
import tensorflow as tf
tf.set_random_seed(...)
However, there is still non-determinism in cuDNN.
With theano it is possible to ensure reproducibility of cuDNN by setting dnn.conv flags: https://github.com/fchollet/keras/issues/2479#issuecomment-213987747
With tensorflow, how do we set those flags?
For some time, I had at least reproducible results when running the training on the CPU. However even that seems not to work any more. Anyone experienced the same?
I'm looking for a way of reproducing keras code, but I'm supposing that it's not possible. Am I right?
Thanks @diogoff but my problem is that I have tensorflow as backend and also I utilize cuDNN. It's the case that you are looking too for a solution.
I gave up on reproducibility because I found that when forcing deterministic behavior in cuDNN, training would be much slower (e.g. from 15 secs/epoch to 30 secs/epoch).
IMO this is a critical issue that merits a high priority. Running a complex model for several minutes is meaningless unless results can be reproduced.
Running the same cell multiple times has given results that differ by several orders of magnitude. I can confirm the latest suggestion does not work for Keras 2.0.2/ TensorFlow 1.0 backend/Anaconda 4.2/Windows 7
import numpy as np
np.random.seed(123)
import tensorflow as tf
tf.set_random_seed(123)
@pylang are you using cuDNN?
@diogoff I have not taken extra steps to install cuDNN. My assumption is no, though I am unaware how verify this absolutely.
Try:
$ ls -las /usr/local/cuda/include/*dnn*
and
$ ls -las /usr/local/cuda/lib64/*dnn*
If you see libcudnn.so installed, you have it and probably tensorflow is using it.
If I remember, tensorflow will print some warning/info messages on startup, saying which libraries it has loaded. On my system, libcudnn.so was one of them.
I searched all files on my Windows machine and found none by that name, nor any system files with "cudnn" (only folders included in Anaconda's TensorFlow site package). I also don't see any warnings aside from the "TensorFlow backend" warning upon import. Seeing that
I have not directly installed the driver, find no library files under this name, and see no unusual warnings at import, I conclude I do not have cudnn installed.
On another note ... I perceived the main issue with non-reproducible results in keras may be related to how the weights are randomized for each call.
I did discover (late last night), that the kernel_initializer
has a number of options for setting up a distribution from which (I assume) the weights are drawn. I have not run substantial tests to make a conclusion nor investigated these options further yet, but my initial tests seem to suggest that selecting different initializers influences the reproducibility of results. For instance, the default initializer is called "glorot_uniform"
. I played with some other distributions and managed to get more reproducible results, although with much higher error.
Since there are many variables, perhaps we should post, simple example here, e.g. single Dense
layer, 1 input linear regression. The results should be consistent for all implementers. We can then confirm the results across different machines for different users.
I picked up mnist_cnn.py
from the examples and set up keras.json
in this way:
{
"image_data_format": "channels_first",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "tensorflow"
}
I ran python mnist_cnn.py
a couple of times and the results did not seem to be reproducible.
Then I edited mnist_cnn.py
and inserted the following code between from __future__ import print_function
(line 8) and import keras
(line 9):
import numpy as np
np.random.seed(123)
import tensorflow as tf
tf.set_random_seed(123)
The results now look sufficiently reproducible to me. The small differences I assume are due to the use of cuDNN.
I tried running without cuDNN:
$ TF_USE_CUDNN=0 python mnist_cnn.py
but it seems it's not possible:
UnimplementedError (see above for traceback): Conv2D for GPU is not currently supported without cudnn
If I switch the backend to Theano:
{
"image_data_format": "channels_first",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "theano"
}
and insert the following code between lines 8-9 in mnist_cnn.py
:
import numpy as np
np.random.seed(123)
and then run:
$ THEANO_FLAGS="dnn.conv.algo_bwd_filter=deterministic,dnn.conv.algo_bwd_data=deterministic" python mnist_cnn.py
the results are fully reproducible.
@diogoff for clarity, what do you consider fully reproducible? Do you know how close your loss results between runs? I'd like to compare notes.
With fully reproducible, I mean I always get exactly the same results in every run:
loss: 0.3336 - acc: 0.8981 - val_loss: 0.0788 - val_acc: 0.9759
loss: 0.1214 - acc: 0.9642 - val_loss: 0.0548 - val_acc: 0.9828
loss: 0.0893 - acc: 0.9733 - val_loss: 0.0443 - val_acc: 0.9847
loss: 0.0735 - acc: 0.9783 - val_loss: 0.0391 - val_acc: 0.9871
loss: 0.0666 - acc: 0.9804 - val_loss: 0.0363 - val_acc: 0.9872
loss: 0.0590 - acc: 0.9825 - val_loss: 0.0369 - val_acc: 0.9873
loss: 0.0542 - acc: 0.9836 - val_loss: 0.0338 - val_acc: 0.9889
loss: 0.0505 - acc: 0.9850 - val_loss: 0.0314 - val_acc: 0.9889
loss: 0.0467 - acc: 0.9861 - val_loss: 0.0299 - val_acc: 0.9896
loss: 0.0451 - acc: 0.9867 - val_loss: 0.0319 - val_acc: 0.9898
loss: 0.0421 - acc: 0.9874 - val_loss: 0.0297 - val_acc: 0.9894
loss: 0.0405 - acc: 0.9880 - val_loss: 0.0309 - val_acc: 0.9895
Test loss: 0.0309449151449 <-- exactly the same up to the last digit
Test accuracy: 0.9895
Keras 2.0.2, Theano 0.9.0 with libgpuarray, CUDA 8.0, cuDNN 5.1.10
I have the same issue when using tensorflow on CPU, searched the solution online for 2 days and find this solved my issue: http://stackoverflow.com/questions/42412660/non-deterministic-gradient-computation
The reason for short is tensorflow using multiple threads or core to do the computation, and this would be a hidden issue when rounding the floating value and sharing it amount multiple threads.
To fix this, if you don't care about speed, just limit the thread tf using when creating a session:
sess = tf.Session(config=tf.ConfigProto(inter_op_parallelism_threads=1))
Can anyone solve the reproducibility issue for training recurrent layers?
Is there any update on this issue? I am experiencing it with the simple CIFAR10 example network, running on TF in GPU mode. Properly seeding both numpy.random.seed() and tf.set_random_seed() does not fix the issue. It seems that weight initialization is the same, but the weights diverge as training progresses.
It's going to be difficult to publish results with Keras+TF if the results are not reproducible.
I will focus on this issue.
I got a different value between validation accuracy and evaluation after loaded weights on the same dataset (validation data). So I cannot use the model trained anywhere.
With the solution of @JacobIsrael123 my initialization (and thus the first loss using model.evaluate
before fit) is the same. However, the accuracies and losses differ when fitting.
I am computing only on CPU. I turned shuffle = False
in the fit
function. Using theano
my code produces the same results.
after trying all the above suggestions, the results are NOT reproducible even on cpu for keras+tensorflow.
Is this still open for tensorflow? Any updates from the keras 2.0 changes? Even if there is some kind of workaround, this is important to me. Very hard to debug without determinism.
Theano should give reproducible results!!!! Are you getting the same accuracy atleast?
This configuration at the top of the code seems to work for me: import numpy as np import tensorflow as tf import random as rn np.random.seed(42) rn.seed(12345)
session_conf = tf.ConfigProto( intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
from keras import backend as K tf.set_random_seed(1234) sess = tf.Session(graph=tf.get_default_graph(), config=session_conf) K.set_session(sess)
----ALSO set the shuffle=False in the fit call----
Note: This config forces single threaded operation. Allowing multithread seems to cause non-reproducible results (as pointed out above by @MorvanZhou). This is running in CPU mode.
Doing what @td2014 mentioned (except for setting shuffle=False) didn't work, but once I added os.environ['PYTHONHASHSEED'] = '0'
in addition to what he suggested, it worked! Setting the
PYTHONHASHSEED
environment variable seems necessary for python3.
import numpy as np
import random as rn
import tensorflow as tf
# Setting PYTHONHASHSEED for determinism was not listed anywhere for TensorFlow,
# but apparently it is necessary for the Theano backend
# (https://github.com/fchollet/keras/issues/850).
os.environ['PYTHONHASHSEED'] = '0'
np.random.seed(7)
rn.seed(7)
# Limit operation to 1 thread for deterministic results.
session_conf = tf.ConfigProto(
intra_op_parallelism_threads=1,
inter_op_parallelism_threads=1
)
from keras import backend as K
tf.set_random_seed(7)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)
[...rest of code...]
Edit: @wanting0wang seems to be correct below. Got ahead of myself.
I tried the method suggested by @abali96 to reproduce my keras+TF model in Jupyter notebook. What I observed is that during a kernel's lifetime the training and evaluation can be reproduced.
That is to say, I can open a jupyter notebook, set seeds, fit the model, and the result (and training process) I get can be reproduced if I run the script again from the beginning. When I shutdown the kernel and reopen it, the results I get from the previous kernel cannot be reproduced. If I have several kernels running at the same time, results yielded from kernel A cannot be reproduced in kernel B even if they share exactly the same hyperparameters and seeds.
Anyone has the same problem or has any advice?
I tried @abali96 method but still do not have reproducible results across runs either - each run under a new kernel. My system is configured to use libcudnn.so
>>> keras.__version__
'2.0.4'
>>> tensorflow.__version__
'1.1.0'
I believe this is known, but if I switch my code to use Theano as the backend, I need to set "conv.algo_bwd_data=deterministic" and "conv.algo_bwd_filter=deterministic" to achieve perfect reproducibility. I think there needs to be an equivalent option in Tensorflow, but I am not sure if there is one. If this is where the problem is coming from, then it isn't necessarily a Keras issue.
I wonder if anyone solved this by trying tensorflow 1.3 with cudnn 6 ?
I am using keras with tensorflow backend. I have to use Tensorflow only, cant change to Teano backend. I am creating a simple 1 layer LSTM model. I need my code to give same val_loss every time I train on the same data. I am running my system on CPU only. I tried:
from numpy.random import seed seed(1337) from tensorflow import set_random_seed set_random_seed(1337)
on the top of my code. I also initialized all the kernels and recurrent as ones:
model.add(LSTM(150,input_shape=(None,124), W_regularizer=l2(0.001),kernel_initializer='ones', recurrent_initializer='ones', bias_initializer='ones')) model.add(Dense(2,kernel_initializer='ones', bias_initializer='ones')) model.add(Activation("softmax"))
I also set shuffle=False in model.fit(). I am using rms for optimizing.
I also set PYTHONHASHSEED to 0. But still I am getting different train loss as well as val loss on each epoch when rum multiple times.
I am running keras on a server with 56 cpu cores and CentOS.
Pls help soon as I have tried everything everyone has suggested in other threads !!!!!
+1
I've tested this toy script on both Ubuntu (with a GeForce GTX 1080 ti GPU) and a Macbook Pro (cpu only). While the least significant digits of the loss values differ across platforms the accuracy is consistent. I've run the script 10 times in a row on both platforms and see the same results each time.
The Ubuntu machine is running:
$ conda list | grep -i cud
cudatoolkit 8.0 1
cudnn 6.0.21 cuda8.0_0
tensorflow-gpu 1.2.1 py36cuda8.0cudnn6.0_0
Both machines are running:
$ python -V
Python 3.6.2 :: Anaconda custom
Here is the output from my Ubuntu machine. Here is the output from macOS.
I'm happy to try out other tests/suggestions if it would be helpful.
Tensorflow is completely useless because of that issue. What's the reason to train models if you can't compare their performance because it's not reproducible?
Official answer to the question: "How can I obtain reproducible results using Keras during development?" - https://keras.io/getting-started/faq/ But it just doesn't work!
I also had a very similar if not the same one with TF backend. It was quite severe because perhaps partly due to my relatively small-sized model and dataset, the performance at the end significantly varied. I tried fixing tensorflow random seed, fixed init (later even with a pickled file for init weights), getting the same data sample provided, but still ended up different results. Seems like tensorflow's multithreading steps in somehow, since turning off the multiple thread of TF 'somehow' worked, not completely though (say, in a repeated experiment, 2/4 happened to be identical).
Moved to Theano, passed the THEANO flags that @diogoff mentioned ($ THEANO_FLAGS="dnn.conv.algo_bwd_filter=deterministic,dnn.conv.algo_bwd_data=deterministic" python mnist_cnn.py
), and everything seeems solved.
Interesting that it's not a huge problem for most of the people out there. Maybe people are using large enough datasets and the stochastic training process makes it less problemtic.
How about intializing it on cpu with a fixed seed, saving the params, and then porting it to GPU? Is that possible? I just have the thought of it, but cannot grab how to code it (if it works)
Yes that’s pretty much what I did by storing the weights in pickled file.
On 10Oct 2017, at 12:59, Kevin notifications@github.com wrote:
How about intializing it on cpu with a fixed seed, saving the params, and then porting it to GPU? Is that possible? I just have the thought of it, but cannot grab how to code it (if it works)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/fchollet/keras/issues/2280#issuecomment-335449453, or mute the thread https://github.com/notifications/unsubscribe-auth/APZ8xdERNfkj7-S_-Y20mrXhAO7gnNdmks5sq1wMgaJpZM4IFh2b.
I had this problem today and just using the above mentioned code, I mean this one:
import numpy as np
np.random.seed(123)
import tensorflow as tf
tf.set_random_seed(123)
solved the random results issue. it seems the issue with tensorflow has been fixed I am using a GPU with CUDA 8.0.61 and CuDNN-6.0
and this link confirms it: https://keras.io/getting-started/faq/#how-can-i-obtain-reproducible-results-using-keras-during-development
I copied the code given in the link at the top of my program, but I'm still having this issue. I'm using a Jupyter notebook on a mac book, only when I restart the whole kernel I get the same result, so the first run always corresponds to the first run, the second to the second etc. But every time I train it I get different results.
What I am doing wrong?
with theano backend (CPU or GPU without cnDNN), I could train reproducible model by
While in pure tensorflow without keras wrapper, it could also be reproducible by
Don't know why, but in Keras + tensorflow backend, non of the above could give reproducible training model.
Environment:
BTW: it would be great if keras could expose an unified API for reproducible training. something like: