h2oai / deepwater

Deep Learning in H2O using Native GPU Backends
Apache License 2.0
282 stars 93 forks source link

TensorFlow will not build meta graphs #29

Closed Jakovitz closed 7 years ago

Jakovitz commented 7 years ago

Hi

I posted this question on the community forum as well, but I'm not sure if it's a bug, or me using Tensorflow wrong:

I cannot get TensorFlow to build a meta graph, when using it with Deep Water. It works fine, when I align the input to use one of the included sizes (e.g. mlp_8x1x1_10.meta).

I built everything from master yesterday on CentOS 7 (selinux is disabled).

The following output is from one of my many experiments with the MNIST training set demo.

INFO: Hidden layers: [200, 200] INFO: Activation function: Rectifier INFO: Input dropout ratio: 0.0 INFO: Hidden layer dropout ratio: [0.0, 0.0] INFO: Creating a fresh model of the following network type: MLP ERRR: java.lang.RuntimeException: Unable to initialize the native Deep Learning backend: resource mlp_784x1x1_10.meta not found. ERRR: at hex.deepwater.DeepWaterModelInfo.setupNativeBackend(DeepWaterModelInfo.java:246) ERRR: at hex.deepwater.DeepWaterModelInfo.(DeepWaterModelInfo.java:193) ERRR: at hex.deepwater.DeepWaterModel.(DeepWaterModel.java:225) ERRR: at hex.deepwater.DeepWater$DeepWaterDriver.buildModel(DeepWater.java:127) ERRR: at hex.deepwater.DeepWater$DeepWaterDriver.computeImpl(DeepWater.java:114) ERRR: at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:169) ERRR: at hex.deepwater.DeepWater$DeepWaterDriver.compute2(DeepWater.java:107) ERRR: at water.H2O$H2OCountedCompleter.compute(H2O.java:1220) ERRR: at jsr166y.CountedCompleter.exec(CountedCompleter.java:468) ERRR: at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) ERRR: at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974) ERRR: at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) ERRR: at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

Jakovitz commented 7 years ago

Additionally, I tried the prebuilt one on Ubuntu 16.04 with a similar result. Even though I'm pretty sure the "/home/fmilo/..." is a bug, I created the directory, but with no luck.

INFO: Hidden layers: [200, 200] INFO: Activation function: Rectifier INFO: Input dropout ratio: 0.0 INFO: Hidden layer dropout ratio: [0.0, 0.0] INFO: Creating a fresh model of the following network type: MLP ERRR: java.lang.RuntimeException: Unable to initialize the native Deep Learning backend: /home/fmilo/workspace/deepwater/tensorflow/src/main/resources/mlp_784x0x0_10.pb ERRR: at hex.deepwater.DeepWaterModelInfo.setupNativeBackend(DeepWaterModelInfo.java:246) ERRR: at hex.deepwater.DeepWaterModelInfo.(DeepWaterModelInfo.java:193) ERRR: at hex.deepwater.DeepWaterModel.(DeepWaterModel.java:218) ERRR: at hex.deepwater.DeepWater$DeepWaterDriver.buildModel(DeepWater.java:122) ERRR: at hex.deepwater.DeepWater$DeepWaterDriver.computeImpl(DeepWater.java:109) ERRR: at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:169) ERRR: at hex.deepwater.DeepWater$DeepWaterDriver.compute2(DeepWater.java:102) ERRR: at water.H2O$H2OCountedCompleter.compute(H2O.java:1203) ERRR: at jsr166y.CountedCompleter.exec(CountedCompleter.java:468) ERRR: at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) ERRR: at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974) ERRR: at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) ERRR: at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

Jakovitz commented 7 years ago

I noticed there's a new build, but that fails with "No backend found". I have also compiled everything on Ubuntu 16.04 now, and it gives me the same error as in my original post.

arnocandel commented 7 years ago

That lastest build only comes with mxnet, we're still working on a releasable version for TF.

Jakovitz commented 7 years ago

Ok, thanks. I also also saw a video explaining that h2o can't build meta graphs yet, so we'll just wait until TF support has matured a bit.