rstudio / tensorflow

TensorFlow for R
https://tensorflow.rstudio.com
Apache License 2.0
1.33k stars 321 forks source link

Error in .External2(C_X11, paste("png::", filename, sep = ""), when using tuning_run() #326

Closed msloryg closed 5 years ago

msloryg commented 5 years ago

I am trying to run some tuning runs on a cluster server.

require(tfruns)
require(tensorflow)

runs <- tuning_run("cnn_test_wmodel.R", sample=0.8 ,confirm = FALSE, flags = list(
  output = c(50,200),
  loss=c("mse", "mape")),
  runs_dir = "hyper_tuning"
)

I get the following error:

4 total combinations of flags (sampled to 3 combinations)
Training run 1/3 (flags = list(output = 200, dropout = 0.5, epochs = 100, loss = "mape")) 
Error in .External2(C_X11, paste("png::", filename, sep = ""), g$width,  : 
  unable to start device PNG
Calls: tuning_run ... with_changed_file_copy -> force -> do.call -> <Anonymous>
In addition: Warning message:
In (function (filename = "Rplot%03d.png", width = 480, height = 480,  :
  unable to open connection to X11 display ''
Execution halted

I did tuning runs on my laptop before that worked, what I really wonder is what png the error is referring to. Usually there is nothing like a plot or something when doing tuning_run. In the following you find my sessionInfo():

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0             lattice_0.20-35        zeallot_0.1.0         
 [4] grid_3.4.3             R6_2.4.0               jsonlite_1.6          
 [7] magrittr_1.5           tfruns_1.4             whisker_0.3-2         
[10] Matrix_1.2-14          reticulate_1.12.0-9001 generics_0.0.2        
[13] keras_2.2.4.1.9001     tools_3.4.3            yaml_2.2.0            
[16] compiler_3.4.3         base64enc_0.1-3        tensorflow_1.13.1.9000

Does anybody have an idea? If you need more information please let me know!

EDIT: In the folder where I save the runs I found a folder "plots", with a png. inside but when I try to open it it says "this file format is probably not supported" (or something equivalent since it is in German). I guess that is the issue. However, what is this file? On my laptop I have the folder as well but it is completely empty.

skeydan commented 5 years ago

I think you might be running this on a server which has no graphical mode:

In (function (filename = "Rplot%03d.png", width = 480, height = 480, : unable to open connection to X11 display ''

It should work if you remove all printing output.

msloryg commented 5 years ago

But how do I remove all printing output? Actually the tuning_run() does not produce any output... So I don't know how I could tell the function it shouldn't do it..

skeydan commented 5 years ago

could you indicate how you are running your code? In a script from the command line or using RStudio Server?

msloryg commented 5 years ago

I am running a shell script (.qs file) on the server. This script runs my .R file which does the tuning_run()

eddelbuettel commented 5 years ago

There are time-honoured ways to run 'headless' -- one common approach is to prefix by xfvb-run. That wraps the call in a virtual framebuffer server which makes the usual x11 resources (font metrics etc) available as if it is was running normally.

skeydan commented 5 years ago

thanks @eddelbuettel !

@msloryg can you try that? also, can you provide the content of cnn_test_wmodel.R?

msloryg commented 5 years ago

Thanks @eddelbuettel, where do I have to put the xfvb-run ? Unfortunately I am not at all familiar with servers. What I usually do is starting my shell script as qsub myscript.qs and the myscript.qs looks like that:

#$ -S /bin/sh
#$ -l h_rt=72:00:00
#$ -cwd

#$ -N zinb
# -j n
#$ -pe smp 1
#$ -m abes

#$ -l h_rss=10G
#$ -l h_vmem=30G 

module load gcc/5.2.0
module load r/3.2.3
module load anaconda
python -c 'import tensorflow'

R CMD BATCH --cnn_skript_wmodel.R 

Of course, @skeydan, this is my cnn_test_wmodel.R (so the file I want to use for the tuning_run()):

library(keras)
library(tensorflow)
require(abind)
require(caret)
require(reticulate)

load("cnn_dataset.Rda")

FLAGS <- flags(flag_integer("filters1", 32), 
               flag_integer("filters2", 64), 
               flag_integer("fsize1", 9),
               flag_integer("fsize2", 3),
               flag_integer("stride1", 2),
               flag_integer("stride2", 1),
               flag_integer("output", 50),
               flag_numeric("dropout", 0.4),
               flag_numeric("reg", 0.01),
               flag_string("loss", "mse"),
               flag_integer("epochs", 100))

samples <- dim(x_train)[1]
n_rows <- dim(x_train)[2]
n_timesteps <- dim(x_train)[3]#ncols: days
n_features <- dim(x_train)[4]#channels
n_outputs <- 1

model <- keras_model_sequential()

model %>%
  layer_conv_2d(filters = FLAGS$filters1, kernel_size = FLAGS$fsize1, activation = "relu",
                input_shape = c(n_rows, n_timesteps, n_features),
                kernel_regularizer = regularizer_l2(l = FLAGS$reg),
                strides=FLAGS$stride1, padding="same") %>%
  layer_conv_2d(filters = FLAGS$filters1, kernel_size = FLAGS$fsize1, activation = "relu",
                kernel_regularizer = regularizer_l2(l = FLAGS$reg),
                strides=FLAGS$stride1, padding="same") %>%
  layer_max_pooling_2d(pool_size = c(2,2)) %>%
  layer_conv_2d(filters = FLAGS$filters2, kernel_size = FLAGS$fsize2, activation = "relu",
                kernel_regularizer = regularizer_l2(l = FLAGS$reg),
                strides=FLAGS$stride2) %>%
  layer_conv_2d(filters = FLAGS$filters2, kernel_size = FLAGS$fsize2, activation = "relu",
                kernel_regularizer = regularizer_l2(l = FLAGS$reg),
                strides=FLAGS$stride2) %>%
  layer_max_pooling_2d(pool_size = c(2,2)) %>%
  layer_flatten() %>%
  layer_dropout(rate=FLAGS$dropout) %>%
  layer_dense(units = FLAGS$output, activation = "relu") %>%
  layer_dense(units=1)
#summary(model)

model %>% compile(
  loss = FLAGS$loss,
  optimizer = optimizer_adam(lr=FLAGS$lrate),
  metrics = c('mse')
)

# Training & Evaluation ----------------------------------------------------

# Fit model to data
history <- model %>% fit(
  x_train, y_train,
  batch_size = 32,
  epochs = FLAGS$epochs,
  verbose = 0,
  validation_split = 0.2,
  callbacks=list(
  callback_reduce_lr_on_plateau(patience = 4),
  callback_early_stopping(patience = 5),
  callback_terminate_on_naan())
  )
eddelbuettel commented 5 years ago

Here is a helper script that is even more advanced (as it does locking): https://github.com/eddelbuettel/prrd/blob/master/inst/scripts/xvfb-run-safe

It has a pointer to a stackoverflow issue. Maybe all that can help you a little. I am traveling right now ...