rstudio / keras3

R Interface to Keras
https://keras3.posit.co/
Other
834 stars 282 forks source link

*** caught segfault *** address 0x7fc0e4463c00, cause 'invalid permissions' at keras_model_sequential() on computing clusters #1368

Closed denvercal1234GitHub closed 1 year ago

denvercal1234GitHub commented 1 year ago

Hi there,

I was running on our cluster by submitting a bash Rscript job to our cluster the infinityFlow R package (https://github.com/ebecht/infinityFlow) which relies on tensorflow through reticulate, but encountered an error below.

Would you mind helping me diagnose the errors?

Thank you for your help.

library(reticulate)

reticulate::conda_create(envname="/ceph/project/borrowlab/qnguyen/Temporary/F37_LIVECD3/r-reticulate", python_version = "3.11")

library(tensorflow)

tensorflow::install_tensorflow(envname = "/ceph/project/borrowlab/qnguyen/Temporary/F37_LIVECD3/r-reticulate", version="2.12")

library(keras)

keras::install_keras(envname = "/ceph/project/borrowlab/qnguyen/Temporary/F37_LIVECD3/r-reticulate", version = "2.12")

reticulate::use_condaenv("/ceph/project/borrowlab/qnguyen/Temporary/F37_LIVECD3/r-reticulate")

library(infinityFlow)
library(glmnetUtils)
library(e1071)
library(tensorflow)
library(keras)

regression_functions <- list(
  XGBoost = fitter_xgboost, # XGBoost
  ## Passed to fitter_nn, e.g. neural networks through keras::fit. See https://keras.rstudio.com/articles/tutorial_basic_regression.html
  NN = fitter_nn,
  SVM = fitter_svm # SVM
  #LASSO2 = fitter_glmnet, # L1-penalized 2nd degree polynomial model
  #LM = fitter_linear # Linear model
)

extra_args_regression_params <- list(
  ########Passed to the first element of `regression_functions`, e.g. XGBoost. See ?xgboost for which parameters can be passed through this list

  list(nrounds = 500, eta = 0.05),

  ########Passed to the second element of `regression_functions`, e.g. NEURAL NETWORKS through keras::fit. See https://keras.rstudio.com/articles/tutorial_basic_regression.html
  list(
  object = { ## Specifies the network's architecture, loss function and optimization method
  model = keras_model_sequential()
  model %>%
  layer_dense(units = backbone_size, activation = "relu", input_shape = backbone_size) %>% 
  layer_dense(units = backbone_size, activation = "relu", input_shape = backbone_size) %>%
  layer_dense(units = 1, activation = "linear")
  model %>%
  compile(loss = "mean_squared_error", optimizer = optimizer_sgd(learning_rate = 0.005))
  serialize_model(model)
  },
  epochs = 1000, ## Number of maximum training epochs. The training is however stopped early if the loss on the validation set does not improve for 20 epochs. This early stopping is hardcoded in fitter_nn.
  validation_split = 0.2, ## Fraction of the training data used to monitor validation loss
  verbose = 0,
  batch_size = 128 ## Size of the minibatches for training.
  ),
  ########Passed to the third element, SVMs. See help(svm, "e1071") for possible arguments
  list(type = "nu-regression", cost = 8, nu=0.5, kernel="radial")
)
*** caught segfault ***
address 0x7fc0e4463c00, cause 'invalid permissions'

Traceback:
 1: py_module_import(module, convert = convert)
 2: import(module)
 3: doTryCatch(return(expr), name, parentenv, handler)
 4: tryCatchOne(expr, names, parentenv, handlers[[1L]])
 5: tryCatchList(expr, classes, parentenv, handlers)
 6: tryCatch(import(module), error = clear_error_handler())
 7: py_resolve_module_proxy(x)
 8: `$.python.builtin.module`(keras, "models")
 9: keras$models
10: keras_model_sequential()
An irrecoverable exception occurred. R is aborting now ...
Segmentation fault (core dumped)

I tried to run the diagnostic codes:

 *** caught segfault ***
address 0x7f02dce2cc00, cause 'invalid permissions'

Traceback:
 1: py_module_import(module, convert = convert)
 2: import(module)
 3: doTryCatch(return(expr), name, parentenv, handler)
 4: tryCatchOne(expr, names, parentenv, handlers[[1L]])
 5: tryCatchList(expr, classes, parentenv, handlers)
 6: tryCatch({    import(module)    TRUE}, error = clear_error_handler(FALSE))
 7: py_module_available("tensorflow")
 8: tensorflow::tf_config()
An irrecoverable exception occurred. R is aborting now ...
Segmentation fault (core dumped)
> reticulate::py_config()
python:         /ceph/project/borrowlab/qnguyen/Temporary/F37_LIVECD3/r-reticulate/bin/python
libpython:      /ceph/project/borrowlab/qnguyen/Temporary/F37_LIVECD3/r-reticulate/lib/libpython3.10.so[NOT FOUND]
pythonhome:     /ceph/project/borrowlab/qnguyen/Temporary/F37_LIVECD3/r-reticulate:/ceph/project/borrowlab/qnguyen/Temporary/F37_LIVECD3/r-reticulate
version:        3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0]
numpy:          /ceph/project/borrowlab/qnguyen/Temporary/F37_LIVECD3/r-reticulate/lib/python3.10/site-packages/numpy
numpy_version:  1.25.0
tensorflow:     /ceph/project/borrowlab/qnguyen/Temporary/F37_LIVECD3/r-reticulate/lib/python3.10/site-packages/tensorflow

NOTE: Python version was forced by use_python function
> tensorflow::tf_config()
Error in py_get_attr_impl(x, name, silent) : 
  AttributeError: partially initialized module 'tensorflow' has no attribute 'VERSION' (most likely due to a circular import)
Run `reticulate::py_last_error()` for details.
> reticulate::import("tensorflow")
Module(tensorflow)
> reticulate::py_last_error()

── Python Exception Message ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
AttributeError: partially initialized module 'tensorflow' has no attribute 'VERSION' (most likely due to a circular import)

── R Traceback ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    ▆
 1. └─tensorflow::tf_config()
 2.   ├─tf$VERSION
 3.   └─reticulate:::`$.python.builtin.module`(tf, "VERSION")
 4.     └─reticulate:::`$.python.builtin.object`(x, name)
 5.       └─reticulate:::py_get_attr_or_item(x, name, TRUE)
 6.         └─reticulate::py_get_attr(x, name)
 7.           └─reticulate:::py_get_attr_impl(x, name, silent)
> sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS

Matrix products: default
BLAS:   /ceph/package/u22/R-base/4.3.0/lib/R/lib/libRblas.so 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8   
 [6] LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/London
tzcode source: system (glibc)

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] e1071_1.7-13         glmnetUtils_1.1.8    gridExtra_2.3        ggcyto_1.28.0        ncdfFlow_2.46.0      BH_1.81.0-1          flowClust_3.38.0    
 [8] lubridate_1.9.2      forcats_1.0.0        stringr_1.5.0        dplyr_1.1.2          purrr_1.0.1          readr_2.1.4          tidyr_1.3.0         
[15] tibble_3.2.1         tidyverse_2.0.0      ggplot2_3.4.2        vegan_2.6-4          permute_0.9-7        Spectre_1.0.0        flowVS_1.32.0       
[22] flowStats_4.12.0     flowViz_1.64.0       lattice_0.21-8       flowSpecs_1.14.0     CytoML_2.12.0        flowWorkspace_4.12.0 scico_1.4.0         
[29] cytotidyr_0.0.1.100  kableExtra_1.3.4     infinityFlow_1.10.0  flowCore_2.12.0      keras_2.11.1         tensorflow_2.11.0    reticulate_1.30     

loaded via a namespace (and not attached):
  [1] splines_4.3.0       later_1.3.1         bitops_1.0-7        matlab_1.0.4        graph_1.78.0        XML_3.99-0.14       deSolve_1.35       
  [8] lifecycle_1.0.3     rprojroot_2.0.3     MASS_7.3-60         magrittr_2.0.3      rmarkdown_2.22      yaml_2.3.7          httpuv_1.6.11      
 [15] sp_2.0-0            pbapply_1.7-2       RColorBrewer_1.1-3  zlibbioc_1.46.0     rvest_1.0.3         BiocGenerics_0.46.0 RCurl_1.98-1.12    
 [22] pracma_2.4.2        rappdirs_0.3.3      S4Vectors_0.38.1    terra_1.7-29        svglite_2.1.1       codetools_0.2-19    xml2_1.3.4         
 [29] tidyselect_1.2.0    shape_1.4.6         raster_3.6-20       matrixStats_1.0.0   stats4_4.3.0        base64enc_0.1-3     webshot_0.5.4      
 [36] jsonlite_1.8.5      CytobankAPI_2.2.1   ks_1.14.0           ellipsis_0.3.2      survival_3.5-5      iterators_1.0.14    systemfonts_1.0.4  
 [43] foreach_1.5.2       tools_4.3.0         Rcpp_1.0.10         glue_1.6.2          mnormt_2.1.1        tfruns_1.5.1        here_1.0.1         
 [50] xfun_0.39           mgcv_1.8-42         withr_2.5.0         fastmap_1.1.1       latticeExtra_0.6-30 fansi_1.0.4         digest_0.6.32      
 [57] timechange_0.2.0    R6_2.5.1            mime_0.12           colorspace_2.1-0    jpeg_0.1-10         utf8_1.2.3          generics_0.1.3     
 [64] hexbin_1.28.3       data.table_1.14.8   corpcor_1.6.10      class_7.3-22        robustbase_0.95-1   httr_1.4.6          IDPmisc_1.1.20     
 [71] whisker_0.4.1       pkgconfig_2.0.3     gtable_0.3.3        RProtoBufLib_2.12.0 pcaPP_2.0-3         htmltools_0.5.5     RBGL_1.76.0        
 [78] scales_1.2.1        Biobase_2.60.0      png_0.1-8           knitr_1.43          rstudioapi_0.14     tzdb_0.4.0          reshape2_1.4.4     
 [85] nlme_3.1-162        curl_5.0.1          proxy_0.4-27        zoo_1.8-12          KernSmooth_2.23-21  parallel_4.3.0      fda_6.0.5          
 [92] pillar_1.9.0        vctrs_0.6.3         promises_1.2.0.1    cytolib_2.12.0      xtable_1.8-4        cluster_2.1.4       Rgraphviz_2.44.0   
 [99] evaluate_0.21       zeallot_0.1.0       mvtnorm_1.1-3       cli_3.6.1           compiler_4.3.0      rlang_1.1.1         rrcov_1.7-3        
[106] mclust_6.0.0        interp_1.1-4        fds_1.8             plyr_1.8.8          stringi_1.7.12      rainbow_3.7         viridisLite_0.4.2  
[113] deldir_1.0-9        BiocParallel_1.34.2 hdrcde_3.4          munsell_0.5.0       glmnet_4.1-7        Matrix_1.5-4.1      hms_1.1.3          
[120] shiny_1.7.4         DEoptimR_1.0-13  
t-kalinowski commented 1 year ago

Error in py_get_attr_impl(x, name, silent) : AttributeError: partially initialized module 'tensorflow' has no attribute 'VERSION' (most likely due to a circular import)

This has been fixed in the development version of keras:

remotes::install_github("rstudio/keras")
keras::install_keras()