rocker-org / ml

experimental machine learning container
GNU General Public License v2.0
50 stars 13 forks source link

Default tensorflow install now requires CUDA 9 #4

Closed restonslacker closed 2 years ago

restonslacker commented 6 years ago

With the release of TF 1.5, the prebuilt binaries are built against CUDA 9 (https://github.com/tensorflow/tensorflow/releases/tag/v1.5.0) instead of CUDA 8. The Dockerfile should probably either derive from the 9.0-cudnn7-runtime container or keras should install version 1.4 (e.g. keras::install_tensorflow(version="1.4.0-gpu"). By default it appears that keras now installs 1.5.

eddelbuettel commented 6 years ago

Can you test and PR?

restonslacker commented 6 years ago

looks like the PR will be for the second option (TF 1.4). I tried the easy/obvious of inheriting from nvidia/cuda:9.0-cudnn7-runtime but got the following error:

> library(keras)
> mnist <- dataset_mnist()
/root/.virtualenvs/r-tensorflow/lib/python2.7/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
Error: ImportError: Traceback (most recent call last):
  File "/root/.virtualenvs/r-tensorflow/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/root/.virtualenvs/r-tensorflow/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/root/.virtualenvs/r-tensorflow/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
ImportError: libcublas.so.8.0: cannot open shared object file: No such file or directory

Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/install_sources#common_installation_problems

for some common reasons and solutions.  Include the entire stack trace
above this error message when

I didn't go through the full set of things in the suggested link, so it's possible there's a configuration error on my end, but I need to explore more.

restonslacker commented 6 years ago

Did a little more digging. The problem is perhaps with the current CRAN version of the keras package. When I install the github version of keras, TF 1.5 runs correctly (or at least my little test is successful). will keep poking as I have time