greta-dev / greta

simple and scalable statistical modelling in R
https://greta-stats.org
Other
527 stars 63 forks source link

Greta not respecting random seed? #285

Open cboettig opened 5 years ago

cboettig commented 5 years ago

I'm not getting consistent identical results from re-knitting this little greta mcmc, running on GPU.

the differences are small, but non-identical, see summary table in this diff

Here's sessionInfo():

sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] greta_0.3.0.9002

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1        rstudioapi_0.10   knitr_1.23        whisker_0.3-2     magrittr_1.5     
 [6] hms_0.4.2         progress_1.2.2    lattice_0.20-38   R6_2.4.0          rlang_0.3.4      
[11] globals_0.12.4    tools_3.6.0       parallel_3.6.0    grid_3.6.0        packrat_0.5.0    
[16] xfun_0.7          coda_0.19-2       htmltools_0.3.6   tfruns_1.4        yaml_2.2.0       
[21] digest_0.6.19     assertthat_0.2.1  crayon_1.3.4      tensorflow_1.13.1 Matrix_1.2-17    
[26] base64enc_0.1-3   codetools_0.2-16  rsconnect_0.8.13  evaluate_0.14     rmarkdown_1.13   
[31] compiler_3.6.0    prettyunits_1.0.2 reticulate_1.12   jsonlite_1.6      future_1.13.0    
[36] listenv_0.7.0     pkgconfig_2.0.2  

Here's tf_config():

> tensorflow::tf_config()
TensorFlow v1.13.1 (/opt/venv/lib/python3.6/site-packages/tensorflow)
Python v3.6 (/opt/venv/bin/python)
> tensorflow::tf_gpu_configured()
TensorFlow built with CUDA:  TRUE 
GPU device name:  /device:GPU:0[1] TRUE
goldingn commented 5 years ago

Huh, that is strange. I don't suppose you've tried this on a CPU?

Getting tensorflow to respect random seeds was a bit of an issue, but I (thought I had) fixed that and tested on CPU. I wouldn't be entirely surprised if it just doesn't respect them on GPU. Will look into it.

cboettig commented 5 years ago

@goldingn Correct, if I run on CPU (by setting Sys.setenv("CUDA_VISIBLE_DEVICES" = -1) to hide the GPU), then I get precisely identical results each time (other than runtime of course).

Results on the GPU are pretty close each time, but not identical. (Nor are results on GPU and CPU identical, but I that's probably to be expected).

Perhaps this is already documented on the tensorflow side, I'm looking at https://tensorflow.rstudio.com/tensorflow/reference/use_session_with_seed.html, and if I do tensorflow::use_session_with_seed(42) it seems to disable the GPU. (Didn't realize tensorflow() provided this, I expected it to obey the usual set.seed() in R; which at least with greta in CPU mode works fine as seen above. Looks like tensorflow::use_session_with_seed also calls py_set_seed(seed). It also looks like it disables parallelism on cpu by default, but maybe I don't understand what that means since it still seems to use all cores in CPU mode for me, or maybe greta is overriding it?

Sorry for filing issues while I'm still learning my way around some of the basics!

goldingn commented 5 years ago

OK, that makes sense, thanks!

Yeah, we grab the seed from the R environment and push it into the TF environment so that we can use consistent seeds even when using future for parallelism (across remote machines etc.)

I'm fairly confident this GPU non-determinism is entirely coming from tensorflow (or rather the GPU routines it uses) and we can't fix it on the greta side. I see there are various blog posts and GitHub issues about this.

We could add a note in the greta documentation about this, though I'm not sure where.

cboettig commented 5 years ago

:+1: good on you for pushing the seed from R to TF automatically.

Right, it might be best to document the issue and suggest users call tensorflow::use_session_with_seed() when they need reproducible runs (and can accept a non-GPU run).

A more aggressive 'fix' would be to mimic what use_session_with_seed() does by automatically disabling the GPU if the user has set a seed, but that's probably poor design to create a relatively unexpected side-effect to a common R function. (I guess that's probably why the RStudio team did it the way they did with a separate function).