Open VladPerervenko opened 6 years ago
Thanks! Had thought of (1) but wasn't sure of a way that wouldn't be too cumbersome but this point is important so I will give it some more thought. Thanks for the recommendation of darch!
Will keep you posted as I make updates... Cheers, Pete
Full reproducibility requires the user to disable GPU and CPU parallelism. kms
0.5.0 now lets you do that since seed accepts a list containing the relevant parameters. If you set the seed but don't disable those things, you'll work on the same test/train splits but may still have simulation error. (kms
now implements a wrapper for keras::use_session_with_seed
.)
I'm not sure that this will solve the problem. See _tf.set_random_seed_ "Sets the graph-level random seed. Operations that rely on a random seed actually derive it from two seeds: the graph-level and operation-level seeds. This sets the graph-level seed..." It is necessary to experiment additionally. By results I will write
Thanks for following up. I actually think it's sufficient. I just added a post showing that I'm able to reproduce predictions for a continuous outcome identically.
Here's the key code:
library(kerasformula)
movies <- read.csv("http://s3.amazonaws.com/dcwoods2717/movies.csv")
out <- kms(log10(gross/budget) ~ . -title, movies, scale="z",
seed = list(seed = 12345, disable_gpu = TRUE, disable_parallel_cpu = TRUE))
out2 <- kms(log10(gross/budget) ~ . -title, movies, scale="z",
seed = list(seed = 12345, disable_gpu = TRUE, disable_parallel_cpu = TRUE))
identical(out$y_test, out2$y_test)
identical(out$predictions, out2$predictions)
(I've run this at batch_size=1
and batch_size=32
for various seeds and number of epochs.)
Idea.