h2oai / deepwater

Deep Learning in H2O using Native GPU Backends
Apache License 2.0
282 stars 93 forks source link

Clarify if Python 3 supported with each of mxnet and tensorflow #51

Open DarrenCook opened 7 years ago

DarrenCook commented 7 years ago

The deepwater whl file says "py2.py3", but the mxnet and tensorflow links only link to py2 versions.

I've tried pip3 install mxnet, which installed "mxnet-0.11.0-py2.py3-none-manylinux1_x86_64.whl", but it still fails with "java.lang.RuntimeException: Unable to initialize the native Deep Learning backend: /tmp/libmxnet.so: libcudart.so.8.0: cannot open shared object file: No such file or directory"

I don't know if this is because of mxnet 0.7 vs. mxnet 0.11, or python 2.7 vs. 3.5, or because of some other configuration step I've missed.

(BTW, I'm working through the deeplearning_mnist_introduction jupyter notebook. Mint 18.1, 64-bit, h2o is 3.13.0.369, H2O API Extensions says "XGBoost, Algos, AutoML, Core V3, Core V4". Python is 3.5.2.)

If I tried backend="tensorflow" I instead get:

java.lang.RuntimeException: Unable to initialize the native Deep Learning backend: Python Tensorflow not installed on this machine. Please run 'pip install tensorflow[-gpu]' first.

This was after having run pip3 install tensorflow. "tensorflow-1.3.0-cp35-cp35m-manylinux1_x86_64.whl" was installed. (Again, is the problem the version 1.3 vs. 1.1, or Python 2.7 vs. 3.5, or something else?)

BTW, identical results with and without gpu=False, for both mxnet and tensorflow.

DarrenCook commented 7 years ago

I just tried again, as since then I've been running both mxnet and tensorflow, with and without GPU, which required jumping through a few more install hoops.

Tensorflow is still failing with the same complaint. Despite tensorflow-gpu appearing under pip3 list, and definitely working.

mxnet complained with a different error message: libcblas.so.3: cannot open shared object file. So I did sudo apt-get install libatlas-base-dev, and now it complains about: libcudnn.so.5 Getting tensorflow 1.3 working required libcudnn v6, so this is what I now have installed:

lib64/libcudnn.so -> libcudnn.so.6 lib64/libcudnn.so.6 -> libcudnn.so.6.0.21 lib64/libcudnn.so.6.0.21*

See https://github.com/apache/incubator-mxnet/issues/5835 I.e. it appears if you update to at least mxnet 0.9 that cudnn will be included, and this kind of dependency problem will go away.

mdymczyk commented 7 years ago

@DarrenCook I'll try to have a look at this issue asap but we are a bit short staffed. tensorflow-1.3.0-cp35-cp35m-manylinux1_x86_64.whl this might be an issue, which we should document - we are using a different TF version internally (1.1.0 iirc) and using different ones might cause problems if you are using a dataset that requires us to generate a new TF meta file.

szha commented 7 years ago

I've been doing the packaging for mxnet. Let me know if any clarification is needed.

DarrenCook commented 6 years ago

BTW, I was just trying the docker version (CPU only, so far) and each of mxnet, tensorflow and caffe are working, and I notice it is using python 3.5.2.

pip3 list tells me it is tensorflow 1.1.0. So, I think that is the missing information: python 3.5 is fine, but tensorflow 1.3.x is not. And mxnet 0.7 is fine, but 0.11 is not. (The root cause of both might be related to a cuda or cudnn version?)

(And, I see tensorflow is now on 1.4, and mxnet is on 0.12. It is really hard to stay on top of everything, isn't it!)

Anyway, I think if the readme was update to say tensorflow 1.1 and mxnet 0.7 only, this issue could be closed.

mdymczyk commented 6 years ago

@DarrenCook yes what you wrote is 100% correct. We are very short staffed nowadays (since we started h2o4gpu and Driverless development) so there's really not much dev going on here.