h2oai / deepwater

Deep Learning in H2O using Native GPU Backends
Apache License 2.0
282 stars 93 forks source link

Unable to initialize the native Deep Learning backend: No backend found. #65

Open nasica88 opened 6 years ago

nasica88 commented 6 years ago

Environment **OS platform, distribution and version : Redhat 7.5 ALT ppc64le Python version (optional): python 3.6 from DriverlessAI CUDA/cuDNN version: CUDA 9.2 cuDNN 7.1 GPU model (optional): V100 CPU model: POWER9 RAM available: 512GB R version : 3.4.1 Tensorflow version : 1.8.0 (built from source)

I am trying to use h2o.deepwater included in DriverlessAI in python and R environment, instead of the web GUI of DAI. Plus, I'd like to use tensorflow as backend.

For this, I set the environment variables to use the python from DriverlessAI.

$ export PATH=/opt/h2oai/dai/python/bin:$PATH $ export LD_LIBRARY_PATH=/opt/h2oai/dai/python/lib:/opt/h2oai/dai/lib:$LD_LIBRARY_PATH $ export PYTHONPATH=/opt/h2oai/dai/cuda-9.2/lib/python3.6/site-packages

This works fine with h2o.deeplearning.

gpu_xgb <- h2o.deeplearning(x = c("TemperatureCelcius","ExhaustVacuumHg","AmbientPressureMillibar","RelativeHumidity"), y = "HourlyEnergyOutputMW", training_frame = train )

However, h2o.deepwater produces an error or "Unable to initialize the native Deep Learning backend: No backend found. Cannot build a Deep Water model."

Below is the error message related to running h2o.deepwater in R with backend of tensorflow.

$ cat t4.R '# Package Load library(reticulate) use_python("/opt/h2oai/dai/python/bin/python") library(Metrics) library(h2o) h2o.init(max_mem_size = "500g") '# Data Load df <- read.csv('/data/rpjt/R_script/user/yslee/powerplant_output.csv') '# Randomly sample 80% of the rows for the training set set.seed(1) train_idx <- sample(1:nrow(df), 0.8*nrow(df)) '# h2o Dataset train <- df[train_idx,] test <- df[-train_idx,] train <- as.h2o(train,col.types=c("string")) test <- as.h2o(test,col.types=c("string")) '# h2o.deepwater model gpu_dl <- h2o.deepwater(x = c("TemperatureCelcius","ExhaustVacuumHg","AmbientPressureMillibar","RelativeHumidity"), y = "HourlyEnergyOutputMW", training_frame = train, backend = "tensorflow", hidden = 10, standardize =T, activation = "Tanh", seed = 1234) h2o.performance(gpu_dl, newdata = test)

$ Rscript t4.R ... R is connected to the H2O cluster: H2O cluster uptime: 16 minutes 30 seconds H2O cluster timezone: Asia/Seoul H2O data parsing timezone: UTC H2O cluster version: 3.20.0.2 H2O cluster version age: 1 month and 22 days H2O cluster name: dai H2O cluster total nodes: 1 H2O cluster total memory: 227.37 GB H2O cluster total cores: 128 H2O cluster allowed cores: 128 H2O cluster healthy: TRUE H2O Connection ip: localhost H2O Connection port: 54321 H2O Connection proxy: NA H2O Internal Security: FALSE H2O API Extensions: Algos, MLI, MLI-Driver, AutoML, Core V3, Core V4 R Version: R version 3.4.1 (2017-06-30)

|======================================================================| 100% |======================================================================| 100% | | 0%

java.lang.RuntimeException: Unable to initialize the native Deep Learning backend: No backend found. Cannot build a Deep Water model.

java.lang.RuntimeException: Unable to initialize the native Deep Learning backend: No backend found. Cannot build a Deep Water model. at hex.deepwater.DeepWaterModelInfo.setupNativeBackend(DeepWaterModelInfo.java:267) at hex.deepwater.DeepWaterModelInfo.(DeepWaterModelInfo.java:214) at hex.deepwater.DeepWaterModel.(DeepWaterModel.java:227) at hex.deepwater.DeepWater$DeepWaterDriver.buildModel(DeepWater.java:131) at hex.deepwater.DeepWater$DeepWaterDriver.computeImpl(DeepWater.java:118) at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:214) at hex.deepwater.DeepWater$DeepWaterDriver.compute2(DeepWater.java:111) at water.H2O$H2OCountedCompleter.compute(H2O.java:1260) at jsr166y.CountedCompleter.exec(CountedCompleter.java:468) at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974) at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

Error: java.lang.RuntimeException: Unable to initialize the native Deep Learning backend: No backend found. Cannot build a Deep Water model. Execution halted

install.pacakges("tensorflow") and library(tensorflow) worked fine in R,

$ ls -l /usr/local/lib64/R/library/tensorflow total 12 -rw-rw-r-- 1 root root 2456 Aug 7 17:45 DESCRIPTION drwxrwxr-x 5 root root 112 Aug 7 17:45 examples drwxrwxr-x 2 root root 125 Aug 7 17:45 help drwxrwxr-x 2 root root 39 Aug 7 17:45 html -rw-rw-r-- 1 root root 1095 Aug 7 17:45 INDEX drwxrwxr-x 2 root root 113 Aug 7 17:45 Meta -rw-rw-r-- 1 root root 2713 Aug 7 17:45 NAMESPACE drwxrwxr-x 2 root root 84 Aug 7 17:45 R

Also, tensorflow is installed in python from DriverlessAI.

$ which python /opt/h2oai/dai/python/bin/python

$ pip list | grep tensorflow tensorflow 1.8.0