tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone
https://tensorflow.org
Apache License 2.0
183.19k stars 73.98k forks source link

ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory #26182

Closed gian1312 closed 5 years ago

gian1312 commented 5 years ago

Please make sure that this is a build/installation issue. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:build_template

System information

I was using tensorflow gpu last year. I wanted to set it up again. I got it running on my Windows 10 partition. Now I have tried to set it up again on my Mint partition. I always get the following error. ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory. I thought TF needs cuda 9.0 and not 10.0?

The error occurs if I execute the following code.

import tensorflow as tf sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

ppwwyyxx commented 5 years ago
jvishnuvardhan commented 5 years ago

@gian1312 I think it is looking for CUDA10 file. The error is due to mismatch is CUDA version. Best approach is install TF from clean state. Please follow @ppwwyyxx suggestion to select best versions (TF1.12, CUDA9.0 or TF1.13,CUDA10.0) for your need. Please uninstall python and tensorflow and then follow the instructions to install TF fresh. Please let me know how it progresses. Thanks!

rhinsall commented 5 years ago

identical problem here.

clean installation of Nvidia drivers, CUDA 10.1 and TF

libcublas.so.10.0 error as soon as TF is called.

Ubuntu 18.04.2 LTS; Also Anaconda install of Python 3.7 (is the anaconda install relevant?); 2070

jvishnuvardhan commented 5 years ago

@rhinsall Which TF version you are trying to install? Could you install CUDA10 or correctly reference the CUDA10.1 path in cuDNN. Thanks

OmnipotentEntity commented 5 years ago

It does not seem possible to install Tensorflow with default packaging on Ubuntu 18.04. You have to either build TF from scratch, which requires sourcing an older version of bazel than is available through the default repositories, or manually install specific versions of nvidia drivers and libraries.

None of the linked wheels from upthread are yet built against CUDA 10.1.

gian1312 commented 5 years ago

Thanks a lot. I relyed on the website and haven't realised, that a new version came out a few days ago. I am sorry. I downgraded to 1.12. Now, my graphic card gets found with the mentioned code.

Sadly, the code (an example from a lecture I attend) which runs on my Windows installation perfectly fine (30 s) takes 6 min on my Linux installation an puts the CPU under load. Is there a work around to force Tensorflow to use the GPU?

rhinsall commented 5 years ago

@rhinsall Which TF version you are trying to install? Could you install CUDA10 or correctly reference the CUDA10.1 path in cuDNN. Thanks

I'll come home much later and report the exact numbers and paths - but it's a fresh install, downloaded yesterday, CUDA 10.1 per Nvidia's instructions and TF clean install using PIP & Python 3.7

ghost commented 5 years ago

@rhinsall I just found this out myself, not sure if it's common knowledge, but got around this by doing

conda install cudatoolkit
conda install cudnn

I have cuda-10.1 installed on my box, this installed a local conda-only cuda-10.0. Obviously this is to just keep tensorflow working while waiting for better support.

rhinsall commented 5 years ago

Excellent advice. Immediate rescue. Thank you very much fabricatedmath.

jvishnuvardhan commented 5 years ago

@gian1312 That is strange. There is a guide on using gpu here. Using those instructions you can force TF to use a gpu. Some times it is better to uninstall and reinstall TF. Please let me know how it progresses. If the issue was resolved, please close the ticket. Thanks!

ivineetm007 commented 5 years ago

hi, I am having the similar problem. So , I created new conda environment and installed tensorflow-gpu as
` conda install tensorflow-gpu Collecting package metadata: done Solving environment: done

Package Plan

environment location: /home/lasii/anaconda3/envs/drunk2

added / updated specs:

The following packages will be downloaded:

package                    |            build
---------------------------|-----------------
_tflow_select-2.1.0        |              gpu           2 KB  defaults
absl-py-0.4.1              |           py35_0         144 KB  defaults
astor-0.7.1                |           py35_0          43 KB  defaults
cupti-9.2.148              |                0         1.7 MB  defaults
gast-0.2.0                 |           py35_0          15 KB  defaults
grpcio-1.12.1              |   py35hdbcaa40_0         1.7 MB  defaults
libprotobuf-3.6.0          |       hdbcaa40_0         4.1 MB  defaults
markdown-2.6.11            |           py35_0         104 KB  defaults
mkl_fft-1.0.6              |   py35h7dd41cf_0         149 KB  defaults
mkl_random-1.0.1           |   py35h4414c95_1         362 KB  defaults
numpy-1.15.2               |   py35h1d66e8a_0          47 KB  defaults
numpy-base-1.15.2          |   py35h81de0dd_0         4.2 MB  defaults
protobuf-3.6.0             |   py35hf484d3e_0         615 KB  defaults
six-1.11.0                 |           py35_1          21 KB  defaults
tensorboard-1.10.0         |   py35hf484d3e_0         3.3 MB  defaults
tensorflow-1.10.0          |gpu_py35hd9c640d_0           3 KB  defaults
tensorflow-base-1.10.0     |gpu_py35had579c0_0       190.6 MB  defaults
tensorflow-gpu-1.10.0      |       hf154084_0           2 KB  defaults
termcolor-1.1.0            |           py35_1           7 KB  defaults
------------------------------------------------------------
                                       Total:       207.1 MB

The following NEW packages will be INSTALLED:

_tflow_select pkgs/main/linux-64::_tflow_select-2.1.0-gpu absl-py pkgs/main/linux-64::absl-py-0.4.1-py35_0 astor pkgs/main/linux-64::astor-0.7.1-py35_0 blas pkgs/main/linux-64::blas-1.0-mkl cudatoolkit pkgs/main/linux-64::cudatoolkit-9.2-0 cudnn pkgs/main/linux-64::cudnn-7.3.1-cuda9.2_0 cupti pkgs/main/linux-64::cupti-9.2.148-0 gast pkgs/main/linux-64::gast-0.2.0-py35_0 grpcio pkgs/main/linux-64::grpcio-1.12.1-py35hdbcaa40_0 intel-openmp pkgs/main/linux-64::intel-openmp-2019.1-144 libgfortran-ng pkgs/main/linux-64::libgfortran-ng-7.3.0-hdf63c60_0 libprotobuf pkgs/main/linux-64::libprotobuf-3.6.0-hdbcaa40_0 markdown pkgs/main/linux-64::markdown-2.6.11-py35_0 mkl pkgs/main/linux-64::mkl-2018.0.3-1 mkl_fft pkgs/main/linux-64::mkl_fft-1.0.6-py35h7dd41cf_0 mkl_random pkgs/main/linux-64::mkl_random-1.0.1-py35h4414c95_1 numpy pkgs/main/linux-64::numpy-1.15.2-py35h1d66e8a_0 numpy-base pkgs/main/linux-64::numpy-base-1.15.2-py35h81de0dd_0 protobuf pkgs/main/linux-64::protobuf-3.6.0-py35hf484d3e_0 six pkgs/main/linux-64::six-1.11.0-py35_1 tensorboard pkgs/main/linux-64::tensorboard-1.10.0-py35hf484d3e_0 tensorflow pkgs/main/linux-64::tensorflow-1.10.0-gpu_py35hd9c640d_0 tensorflow-base pkgs/main/linux-64::tensorflow-base-1.10.0-gpu_py35had579c0_0 tensorflow-gpu pkgs/main/linux-64::tensorflow-gpu-1.10.0-hf154084_0 termcolor pkgs/main/linux-64::termcolor-1.1.0-py35_1 werkzeug pkgs/main/linux-64::werkzeug-0.14.1-py35_0 ` After installation . I just imported tensorflow and got the error.

`Traceback (most recent call last): File "/home/lasii/.local/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in from tensorflow.python.pywrap_tensorflow_internal import * File "/home/lasii/.local/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in _pywrap_tensorflow_internal = swig_import_helper() File "/home/lasii/.local/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description) File "/home/lasii/anaconda3/envs/drunk2/lib/python3.5/imp.py", line 243, in load_module return load_dynamic(name, filename, file) File "/home/lasii/anaconda3/envs/drunk2/lib/python3.5/imp.py", line 343, in load_dynamic return _load(spec) ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "", line 1, in File "/home/lasii/.local/lib/python3.5/site-packages/tensorflow/init.py", line 24, in from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import File "/home/lasii/.local/lib/python3.5/site-packages/tensorflow/python/init.py", line 49, in from tensorflow.python import pywrap_tensorflow File "/home/lasii/.local/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in raise ImportError(msg) ImportError: Traceback (most recent call last): File "/home/lasii/.local/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in from tensorflow.python.pywrap_tensorflow_internal import * File "/home/lasii/.local/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in _pywrap_tensorflow_internal = swig_import_helper() File "/home/lasii/.local/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description) File "/home/lasii/anaconda3/envs/drunk2/lib/python3.5/imp.py", line 243, in load_module return load_dynamic(name, filename, file) File "/home/lasii/anaconda3/envs/drunk2/lib/python3.5/imp.py", line 343, in load_dynamic return _load(spec) ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory

Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors `

I just started using github. Guide me if I am posting improperly.

codexponent commented 5 years ago

@ivineetm007 , Can you check the CUDA version!

ivineetm007 commented 5 years ago

@codexponent It's 9.20 Conda automatically installed it while installing tensorflow-gpu.

codexponent commented 5 years ago

I think you should update your CUDA version to 10 along. This link will help you Link: https://www.nvidia.com/Download/index.aspx?lang=en-us

ivineetm007 commented 5 years ago

@codexponent I installed cuda 10.0 in conda by conda install -c fragcolor cuda10.0

Now , there are two cuda in conda environment package list. cudatoolkit 9.2 cuda 10.0

But the same error occurs on importing tensorflow.

codexponent commented 5 years ago

@ivineetm007 , Can you do nvidia-smi and check the head of the table! I am sure that you need to update cuda by downloading the nvidia driver from their website.

ivineetm007 commented 5 years ago

@codexponent header NVIDIA-SMI 396.54 Driver Version: 396.54

I am working on a PC in college which is alloted to two or three students. I am not sure if I install cuda by downloading , it will not affect the other environment in conda. A little history... I am using code in the link (https://github.com/DevendraPratapYadav/gsoc18_RedHenLab/tree/master/video_processing_pipeline) In this link, setup is done on conda . Two weeks ago, tensorflow was [running] perfectly while running the above code.
But someone updated conda in the PC. Now, I am having libculas.so.10.0 error.

codexponent commented 5 years ago

@ivineetm007 , if this is not your pc i suggest you don't update it as it might break other environments working for cuda 9. Do one thing, create a new environment, install tensorflow with the specific version number pip install tensorfow==1.10.0 and then test a very simple code like addition of 2 numbers(tf.add). See if this runs or not.

ivineetm007 commented 5 years ago

@codexponent I tried your suggestion. It worked fine . Then I tried to install tf-gpu and keras as - conda install -y -c anaconda tensorflow-gpu==1.7.0 conda install -y keras Now I am having error- AttributeError: module 'tensorflow.python.training.checkpointable' has no attribute 'CheckpointableBase' I followed the solution for this error in the link (https://github.com/tensorflow/tensorflow/issues/20499l) which suggested reinstalling. I think some other version of tensorflow-gpu will work

codexponent commented 5 years ago

@ivineetm007 , try to do the same thing with opening tf session on the gpu. This link may help Link: https://www.tensorflow.org/guide/using_gpu

Another solution: Don't install anything from conda, just install from pip Steps: 1) Create a fresh environment 2) pip install tensorflow==1.12.0 3) pip install tensorflow-gpu==1.12.0 4) pip install keras==2.1.3 If you have anything that you want to install from conda, check if it is available on the pip version. If it is not then, Let's say that your env name is my_env_1 after activating that environment, type which conda, if this gives the path to your created environment (...\my_env_1...), then you can install other essential environments. If this gives (..\...), then type pip install conda, then install other essential environments. (be sure to check again by typing which conda)

lipingbj commented 5 years ago

Same problem.My cuda version is 10.1,but the the libcublas.so.10.0 file is not in the catalogue of lib64.I am installing the tensorflow-gpu with the command 'pip install tensorflow-gpu'.

lipingbj commented 5 years ago

Same problem.My cuda version is 10.1,but the the libcublas.so.10.0 file is not in the catalogue of lib64.I am installing the tensorflow-gpu with the command 'pip install tensorflow-gpu'.

It seems that the libcublas-version is removed by the cuda 10

codexponent commented 5 years ago

@lipingbj , did you update the cuda version from conda command or through nvidia official site, I think doing from the actual site might help t get those .so files Link: https://www.nvidia.com/Download/index.aspx?lang=en-us

RazorBladeQuant commented 5 years ago

@lipingbj so i had a similar issue, when pushing an upgrade to a tensorflow code which would call 200 sagemakers in parallel. i solved it by fixing the numpy version to numpy==1.14.5 and tensorflow-gpu to 1.12.0. If you would you like i can paste the dockerfile i created to ensure it works?

mostafaelhoushi commented 5 years ago

Same problem.My cuda version is 10.1,but the the libcublas.so.10.0 file is not in the catalogue of lib64.I am installing the tensorflow-gpu with the command 'pip install tensorflow-gpu'.

It seems that the libcublas-version is removed by the cuda 10

After installing CUDA 10 I have found libcublas.so.10 under /usr/lib/x86_64-linux-gnu/. So you need to add /usr/lib/x86_64-linux-gnu/ to your library path by calling:

> export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu/

And also since TensorFlow is looking for libcublas.so.10.0 rather than libcublas.so.10 (without the last .0) you need to create a symlink:

ln -s /usr/lib/x86_64-linux-gnu/libcublas.so.10 /usr/lib/x86_64-linux-gnu/libcublas.so.10.0
mostafaelhoushi commented 5 years ago

Please look at the instructions here after installing CUDA 10: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#environment-setup

priyakansal commented 5 years ago

hi all,

I am facilng the same issue, but my problem is little different, i am able to install and import tensorflow-gpu on my local machine as well as when building the docker container, everything is working fine. but when I am building my docker image from Dockerfile and docker-compose-up...build, i am getting this error. Please help me out, I really dont know why this is happening in the building of docker image.

dattran2346 commented 5 years ago

After installing cuda, you need to export $PATH and $LD_LIBRARY_PATH. Tensorflow will use these environment variables to load package. For example, if you install cuda at /usr/local/, you can add this to your .zshrc or .bashrc (depend on the shell you using)

CUDA_VERSION=10.0

export PATH=/usr/local/cuda-$CUDA_VERSION/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-$CUDA_VERSION/lib64:$LD_LIBRARY_PATH

This trick can be used to change the version of cuda you want to use.

loretoparisi commented 5 years ago

@mostafaelhoushi I did the simlink but this does not make the trick:

ln -s /usr/lib/x86_64-linux-gnu/libcublas.so.10 /usr/lib/x86_64-linux-gnu/libcublas.so.10
root@b55736f184ff:/notebooks# python3.6 -c "import tensorflow as tf; print(tf.__version__);"
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/usr/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/usr/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory

Failed to load the native TensorFlow runtime.
loretoparisi commented 5 years ago

@dattran2346 This should work with CUDA10 installed already, but if you start from older Docker images, you may have installed

root@b55736f184ff:/notebooks# echo  $CUDA_VERSION
9.0.176
mostafaelhoushi commented 5 years ago

@mostafaelhoushi I did the simlink but this does not make the trick:

ln -s /usr/lib/x86_64-linux-gnu/libcublas.so.10 /usr/lib/x86_64-linux-gnu/libcublas.so.10
root@b55736f184ff:/notebooks# python3.6 -c "import tensorflow as tf; print(tf.__version__);"
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/usr/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/usr/lib/python3.6/imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/lib/python3.6/imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory

Failed to load the native TensorFlow runtime.

Did you make sure you installed CUDA10.0? Or which version is installed?

priyakansal commented 5 years ago

@dattran2346 @mostafaelhoushi Do i need to install cuda and cudnn during the build of the docker image also, like this :

conda install -c fragcolor cuda10.0
priyakansal commented 5 years ago

@dattran2346 i have exported the paths as you suggested, but still not working...

dattran2346 commented 5 years ago

Tensorflow-gpu conda

@priyakansal, you may need to conda uninstall cuda10.0 and run conda install tensorflow-gpu. @loretoparisi, may be try lower version of tensorflow or use conda to install or even upgrade your cuda version 🤔

Ps 1: for installing cudatoolkit and cudnn, I found this guide very useful. Ps 2: Install cudatoolkit and cudnn by runtime file will install the library in /usr/local/ while install by .deb file will install in /usr/lib/x86_64-linux-gnu/. So your $PATH and $LD_LIBRARY_PATH need to change accordingly. Install cudatoolkit and cudnn by conda will install the library ~/miniconda3/envs/<name>/lib. And you do not need to export Ps 3: What if I have installed cudatoolkit and cudnn and also install tensorflow-gpu using conda. Tensoflow-gpu will use the libaries install within conda enviroment.

Hope this help, Correct me if I'm wrong 😄 Cheers

priyakansal commented 5 years ago

@dattran2346 Thankyou so much for so detailed explanation. if i am running conda install tensorflow-gpu, then also it is not working, however, i have not tried is with conda uninstall cuda10.0. Here, the problem is that i also want to install tensorflow-serving-api-gpu, which is not available for conda-install, so need to install using pip, but when installing this.. i am getting the same error. please note that, i am doing all this inside the docker. On my local machine(ubuntu), everything is working fine.

loretoparisi commented 5 years ago

What I did was this https://gist.github.com/loretoparisi/4a096fc3625f60403c8734de9660cbcc

add-apt-repository ppa:jonathonf/python-3.6
apt-get update & apt-get install -y python3.6
curl https://bootstrap.pypa.io/get-pip.py > get-pip.py
python3.6 get-pip.py
pip3 uninstall tensorflow-gpu
pip3.6 install tensorflow-gpu==1.12.0
python3.6 -c "import tensorflow as tf; print(tf.__version__);"

Basically you will get Python3.6, CUDA 9 and TF 1.12.0. We have to remote TF-GPU 1.13.0, and then install TF 1.12.0 GPU.

priyakansal commented 5 years ago

@loretoparisi Hi , Sorry I am bit new to docker ... so when I am building some image ... either or < tensorflow-serving: latest -gpu> using docker run or nivida-docker rum and importing the packages related to tensorflow every thing is working fine ... but when i am building my custom image with anaconda as a base image using docker-compose or nivida-docker- compose build command .. it is not working..

mostafaelhoushi commented 5 years ago

@dattran2346 i have exported the paths as you suggested, but still not working...

Can you try to search for the missing file libcublas.so.10.0 on your file system. e.g. by using

find / -name "libcublas.so.10.0"

and then when you find the path add it to LD_LIBRARY_PATH environment variable. If you can't find it, then you probably need to install the correct version.

priyakansal commented 5 years ago

@mostafaelhoushi When i am running this command find / -name "libcublas.so.10.0" the output is

/var/lib/docker/overlay2/33ff618e94595ffbdc09016439dc6a469fa8adc3ec3b5231f776d6065aab7968/diff/root/.cache/bazel/_bazel_root/e53bbb0b0da4e26d24b415310219b953/execroot/tf_serving/bazel-out/k8-opt/bin/tensorflow_serving/model_servers/tensorflow_model_server.runfiles/tf_serving/external/local_config_cuda/cuda/cuda/lib/libcublas.so.10.0
/var/lib/docker/overlay2/33ff618e94595ffbdc09016439dc6a469fa8adc3ec3b5231f776d6065aab7968/diff/root/.cache/bazel/_bazel_root/e53bbb0b0da4e26d24b415310219b953/execroot/tf_serving/bazel-out/k8-opt/bin/tensorflow_serving/model_servers/tensorflow_model_server.runfiles/tf_serving/_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccublas___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib/libcublas.so.10.0
/var/lib/docker/overlay2/33ff618e94595ffbdc09016439dc6a469fa8adc3ec3b5231f776d6065aab7968/diff/root/.cache/bazel/_bazel_root/e53bbb0b0da4e26d24b415310219b953/execroot/tf_serving/bazel-out/k8-opt/bin/tensorflow_serving/model_servers/tensorflow_model_server.runfiles/local_config_cuda/cuda/cuda/lib/libcublas.so.10.0
/var/lib/docker/overlay2/33ff618e94595ffbdc09016439dc6a469fa8adc3ec3b5231f776d6065aab7968/diff/root/.cache/bazel/_bazel_root/e53bbb0b0da4e26d24b415310219b953/execroot/tf_serving/bazel-out/k8-opt/bin/_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccublas___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib/libcublas.so.10.0
/var/lib/docker/overlay2/33ff618e94595ffbdc09016439dc6a469fa8adc3ec3b5231f776d6065aab7968/diff/root/.cache/bazel/_bazel_root/e53bbb0b0da4e26d24b415310219b953/execroot/tf_serving/bazel-out/k8-opt/genfiles/external/local_config_cuda/cuda/cuda/lib/libcublas.so.10.0
/var/lib/docker/overlay2/33ff618e94595ffbdc09016439dc6a469fa8adc3ec3b5231f776d6065aab7968/diff/root/.cache/bazel/_bazel_root/e53bbb0b0da4e26d24b415310219b953/execroot/tf_serving/bazel-out/host/genfiles/external/local_config_cuda/cuda/cuda/lib/libcublas.so.10.0
/var/lib/docker/overlay2/97cb0c942535cde4622f53bf094251cd1aef1cfc744e8ddda1472ee691f87618/diff/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcublas.so.10.0
/var/lib/docker/overlay2/2fb234250d278545f55a004fcd436b4cba5e847c40503b990ffe800f3b440cb5/diff/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcublas.so.10.0
/var/lib/docker/overlay2/c704b6be3bc1a5d25119fa46216a4e64f872d8001d8bed6d40930f6420ffb091/diff/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcublas.so.10.0
/usr/local/cuda-10.0/lib64/libcublas.so.10.0
priyakansal commented 5 years ago

@mostafaelhoushi When i am running this command find / -name "libcublas.so.10.0" the output is

/var/lib/docker/overlay2/33ff618e94595ffbdc09016439dc6a469fa8adc3ec3b5231f776d6065aab7968/diff/root/.cache/bazel/_bazel_root/e53bbb0b0da4e26d24b415310219b953/execroot/tf_serving/bazel-out/k8-opt/bin/tensorflow_serving/model_servers/tensorflow_model_server.runfiles/tf_serving/external/local_config_cuda/cuda/cuda/lib/libcublas.so.10.0
/var/lib/docker/overlay2/33ff618e94595ffbdc09016439dc6a469fa8adc3ec3b5231f776d6065aab7968/diff/root/.cache/bazel/_bazel_root/e53bbb0b0da4e26d24b415310219b953/execroot/tf_serving/bazel-out/k8-opt/bin/tensorflow_serving/model_servers/tensorflow_model_server.runfiles/tf_serving/_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccublas___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib/libcublas.so.10.0
/var/lib/docker/overlay2/33ff618e94595ffbdc09016439dc6a469fa8adc3ec3b5231f776d6065aab7968/diff/root/.cache/bazel/_bazel_root/e53bbb0b0da4e26d24b415310219b953/execroot/tf_serving/bazel-out/k8-opt/bin/tensorflow_serving/model_servers/tensorflow_model_server.runfiles/local_config_cuda/cuda/cuda/lib/libcublas.so.10.0
/var/lib/docker/overlay2/33ff618e94595ffbdc09016439dc6a469fa8adc3ec3b5231f776d6065aab7968/diff/root/.cache/bazel/_bazel_root/e53bbb0b0da4e26d24b415310219b953/execroot/tf_serving/bazel-out/k8-opt/bin/_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccublas___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib/libcublas.so.10.0
/var/lib/docker/overlay2/33ff618e94595ffbdc09016439dc6a469fa8adc3ec3b5231f776d6065aab7968/diff/root/.cache/bazel/_bazel_root/e53bbb0b0da4e26d24b415310219b953/execroot/tf_serving/bazel-out/k8-opt/genfiles/external/local_config_cuda/cuda/cuda/lib/libcublas.so.10.0
/var/lib/docker/overlay2/33ff618e94595ffbdc09016439dc6a469fa8adc3ec3b5231f776d6065aab7968/diff/root/.cache/bazel/_bazel_root/e53bbb0b0da4e26d24b415310219b953/execroot/tf_serving/bazel-out/host/genfiles/external/local_config_cuda/cuda/cuda/lib/libcublas.so.10.0
/var/lib/docker/overlay2/97cb0c942535cde4622f53bf094251cd1aef1cfc744e8ddda1472ee691f87618/diff/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcublas.so.10.0
/var/lib/docker/overlay2/2fb234250d278545f55a004fcd436b4cba5e847c40503b990ffe800f3b440cb5/diff/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcublas.so.10.0
/var/lib/docker/overlay2/c704b6be3bc1a5d25119fa46216a4e64f872d8001d8bed6d40930f6420ffb091/diff/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcublas.so.10.0
/usr/local/cuda-10.0/lib64/libcublas.so.10.0
mostafaelhoushi commented 5 years ago

@mostafaelhoushi When i am running this command find / -name "libcublas.so.10.0" the output is

/var/lib/docker/overlay2/33ff618e94595ffbdc09016439dc6a469fa8adc3ec3b5231f776d6065aab7968/diff/root/.cache/bazel/_bazel_root/e53bbb0b0da4e26d24b415310219b953/execroot/tf_serving/bazel-out/k8-opt/bin/tensorflow_serving/model_servers/tensorflow_model_server.runfiles/tf_serving/external/local_config_cuda/cuda/cuda/lib/libcublas.so.10.0
/var/lib/docker/overlay2/33ff618e94595ffbdc09016439dc6a469fa8adc3ec3b5231f776d6065aab7968/diff/root/.cache/bazel/_bazel_root/e53bbb0b0da4e26d24b415310219b953/execroot/tf_serving/bazel-out/k8-opt/bin/tensorflow_serving/model_servers/tensorflow_model_server.runfiles/tf_serving/_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccublas___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib/libcublas.so.10.0
/var/lib/docker/overlay2/33ff618e94595ffbdc09016439dc6a469fa8adc3ec3b5231f776d6065aab7968/diff/root/.cache/bazel/_bazel_root/e53bbb0b0da4e26d24b415310219b953/execroot/tf_serving/bazel-out/k8-opt/bin/tensorflow_serving/model_servers/tensorflow_model_server.runfiles/local_config_cuda/cuda/cuda/lib/libcublas.so.10.0
/var/lib/docker/overlay2/33ff618e94595ffbdc09016439dc6a469fa8adc3ec3b5231f776d6065aab7968/diff/root/.cache/bazel/_bazel_root/e53bbb0b0da4e26d24b415310219b953/execroot/tf_serving/bazel-out/k8-opt/bin/_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccublas___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib/libcublas.so.10.0
/var/lib/docker/overlay2/33ff618e94595ffbdc09016439dc6a469fa8adc3ec3b5231f776d6065aab7968/diff/root/.cache/bazel/_bazel_root/e53bbb0b0da4e26d24b415310219b953/execroot/tf_serving/bazel-out/k8-opt/genfiles/external/local_config_cuda/cuda/cuda/lib/libcublas.so.10.0
/var/lib/docker/overlay2/33ff618e94595ffbdc09016439dc6a469fa8adc3ec3b5231f776d6065aab7968/diff/root/.cache/bazel/_bazel_root/e53bbb0b0da4e26d24b415310219b953/execroot/tf_serving/bazel-out/host/genfiles/external/local_config_cuda/cuda/cuda/lib/libcublas.so.10.0
/var/lib/docker/overlay2/97cb0c942535cde4622f53bf094251cd1aef1cfc744e8ddda1472ee691f87618/diff/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcublas.so.10.0
/var/lib/docker/overlay2/2fb234250d278545f55a004fcd436b4cba5e847c40503b990ffe800f3b440cb5/diff/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcublas.so.10.0
/var/lib/docker/overlay2/c704b6be3bc1a5d25119fa46216a4e64f872d8001d8bed6d40930f6420ffb091/diff/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcublas.so.10.0
/usr/local/cuda-10.0/lib64/libcublas.so.10.0

OK. I see libcublas.so.10.0 is found in /usr/local/cuda-10.0/lib64/. Try running this command:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.0/lib64/

and try again.

NOTE: I see the library is also found in your docker system. I am not familiar with dockers, so maybe someone else could help here. But try the above command and see.

tensorflow-bot[bot] commented 5 years ago

Are you satisfied with the resolution of your issue? Yes No

littlehome-eugene commented 5 years ago

It happened to me when I installed cuda-10.1 not cuda-10.0 , downgrading to 10.0 did fix it

priyakansal commented 5 years ago

@littlehome-eugene But I am using cuda-10.0 only Btw, have you done it for docker

plche commented 5 years ago

Same problem.My cuda version is 10.1,but the the libcublas.so.10.0 file is not in the catalogue of lib64.I am installing the tensorflow-gpu with the command 'pip install tensorflow-gpu'.

It seems that the libcublas-version is removed by the cuda 10

After installing CUDA 10 I have found libcublas.so.10 under /usr/lib/x86_64-linux-gnu/. So you need to add /usr/lib/x86_64-linux-gnu/ to your library path by calling:

> export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu/

And also since TensorFlow is looking for libcublas.so.10.0 rather than libcublas.so.10 (without the last .0) you need to create a symlink:

ln -s /usr/lib/x86_64-linux-gnu/libcublas.so.10 /usr/lib/x86_64-linux-gnu/libcublas.so.10

There is a typo in the last command, it should be: ln -s /usr/lib/x86_64-linux-gnu/libcublas.so.10 /usr/lib/x86_64-linux-gnu/libcublas.so.10.0 Also consider issuing that command with root privileges (sudo) or you will get a permission denied error...

mostafaelhoushi commented 5 years ago

Same problem.My cuda version is 10.1,but the the libcublas.so.10.0 file is not in the catalogue of lib64.I am installing the tensorflow-gpu with the command 'pip install tensorflow-gpu'.

It seems that the libcublas-version is removed by the cuda 10

After installing CUDA 10 I have found libcublas.so.10 under /usr/lib/x86_64-linux-gnu/. So you need to add /usr/lib/x86_64-linux-gnu/ to your library path by calling:

> export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu/

And also since TensorFlow is looking for libcublas.so.10.0 rather than libcublas.so.10 (without the last .0) you need to create a symlink:

ln -s /usr/lib/x86_64-linux-gnu/libcublas.so.10 /usr/lib/x86_64-linux-gnu/libcublas.so.10

There is a typo in the last command, it should be: ln -s /usr/lib/x86_64-linux-gnu/libcublas.so.10 /usr/lib/x86_64-linux-gnu/libcublas.so.10.0 Also consider issuing that command with root privileges (sudo) or you will get a permission denied error...

Thanks @plche ! I fixed it

ghost commented 5 years ago

just remove everything about 10.1 and downgrade it to Cuda 10.0 and it will work. Nothing else worked for me.

codexponent commented 5 years ago

@mostafaelhoushi When i am running this command find / -name "libcublas.so.10.0" the output is

/var/lib/docker/overlay2/33ff618e94595ffbdc09016439dc6a469fa8adc3ec3b5231f776d6065aab7968/diff/root/.cache/bazel/_bazel_root/e53bbb0b0da4e26d24b415310219b953/execroot/tf_serving/bazel-out/k8-opt/bin/tensorflow_serving/model_servers/tensorflow_model_server.runfiles/tf_serving/external/local_config_cuda/cuda/cuda/lib/libcublas.so.10.0
/var/lib/docker/overlay2/33ff618e94595ffbdc09016439dc6a469fa8adc3ec3b5231f776d6065aab7968/diff/root/.cache/bazel/_bazel_root/e53bbb0b0da4e26d24b415310219b953/execroot/tf_serving/bazel-out/k8-opt/bin/tensorflow_serving/model_servers/tensorflow_model_server.runfiles/tf_serving/_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccublas___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib/libcublas.so.10.0
/var/lib/docker/overlay2/33ff618e94595ffbdc09016439dc6a469fa8adc3ec3b5231f776d6065aab7968/diff/root/.cache/bazel/_bazel_root/e53bbb0b0da4e26d24b415310219b953/execroot/tf_serving/bazel-out/k8-opt/bin/tensorflow_serving/model_servers/tensorflow_model_server.runfiles/local_config_cuda/cuda/cuda/lib/libcublas.so.10.0
/var/lib/docker/overlay2/33ff618e94595ffbdc09016439dc6a469fa8adc3ec3b5231f776d6065aab7968/diff/root/.cache/bazel/_bazel_root/e53bbb0b0da4e26d24b415310219b953/execroot/tf_serving/bazel-out/k8-opt/bin/_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccublas___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib/libcublas.so.10.0
/var/lib/docker/overlay2/33ff618e94595ffbdc09016439dc6a469fa8adc3ec3b5231f776d6065aab7968/diff/root/.cache/bazel/_bazel_root/e53bbb0b0da4e26d24b415310219b953/execroot/tf_serving/bazel-out/k8-opt/genfiles/external/local_config_cuda/cuda/cuda/lib/libcublas.so.10.0
/var/lib/docker/overlay2/33ff618e94595ffbdc09016439dc6a469fa8adc3ec3b5231f776d6065aab7968/diff/root/.cache/bazel/_bazel_root/e53bbb0b0da4e26d24b415310219b953/execroot/tf_serving/bazel-out/host/genfiles/external/local_config_cuda/cuda/cuda/lib/libcublas.so.10.0
/var/lib/docker/overlay2/97cb0c942535cde4622f53bf094251cd1aef1cfc744e8ddda1472ee691f87618/diff/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcublas.so.10.0
/var/lib/docker/overlay2/2fb234250d278545f55a004fcd436b4cba5e847c40503b990ffe800f3b440cb5/diff/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcublas.so.10.0
/var/lib/docker/overlay2/c704b6be3bc1a5d25119fa46216a4e64f872d8001d8bed6d40930f6420ffb091/diff/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcublas.so.10.0
/usr/local/cuda-10.0/lib64/libcublas.so.10.0

OK. I see libcublas.so.10.0 is found in /usr/local/cuda-10.0/lib64/. Try running this command:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.0/lib64/

and try again.

NOTE: I see the library is also found in your docker system. I am not familiar with dockers, so maybe someone else could help here. But try the above command and see.

@mostafaelhoushi have given the best solution. Anyone who is confused see this answer. :)

stiege commented 5 years ago

Unfortunately my underlying question is a bit unrelated to this thread - I have the wrong version installed. However I'm hoping there's someone more knowledgeable here that can answer my actual query below.

I'm running arch linux; I installed tensorflow 2:

pip install tensorflow-gpu==2.0.0-alpha0

I had previously been running an older version of the cuda and cudnn packages in order to work with tensorflow 1. I removed these and installed the latest in the AUR:

[stiege@archie ~]$ sudo pacman -S cuda cudnn
[sudo] password for stiege: 
warning: cuda-10.1.105-6 is up to date -- reinstalling
warning: cudnn-7.5.0.56-1 is up to date -- reinstalling
resolving dependencies...
looking for conflicting packages...

Packages (2) cuda-10.1.105-6  cudnn-7.5.0.56-1

Total Installed Size:  4390.26 MiB
Net Upgrade Size:         0.00 MiB

:: Proceed with installation? [Y/n] Y

Note the cuda version is actually 10.1; however I get the same error as others in the thread:

ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory

But -

[stiege@archie ~]$ ldconfig -p 2>/dev/null | grep libcublas.so
    libcublas.so.10 (libc6,x86-64) => /opt/cuda/lib64/libcublas.so.10
    libcublas.so (libc6,x86-64) => /opt/cuda/lib64/libcublas.so

I can find nothing about why only these two libcublas.so* links are created - why is it just for the major version and not the minor and patch versions? Is this by a convention / standard? Links/Docs? I also still can't find these in the "standard place" - which I assumed is what ldconfig was doing:

[stiege@archie ~]$ find /usr/lib/ -name libcublas.so*
[stiege@archie ~]$ find /lib/ -name libcublas.so*
[stiege@archie ~]$

And this is what makes me concerned about the issue of the actual thread - it appears that even libcublas.so.10.1 isn't even available:

In [38]: l = ctypes.cdll.LoadLibrary("libcublas.so")                                                                          

In [39]: l = ctypes.cdll.LoadLibrary("libcublas.so.10")                                                                       

In [40]: l = ctypes.cdll.LoadLibrary("libcublas.so.10.1")                                                                     
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-40-9eb0347ef2f9> in <module>
----> 1 l = ctypes.cdll.LoadLibrary("libcublas.so.10.1")
[stiege@archie ~]$ cat /etc/ld.so.conf.d/cuda.conf 
/opt/cuda/lib64
/opt/cuda/nvvm/lib64
/opt/cuda/extras/CUPTI/lib64

^ Again there are lots of shared objects in these directories; I'm not sure why only the 2 mentioned above end up being processed by ldconfig; is this basically all by the underlying convention? It seems reasonable to me to ask for a specific minor version, however much of the guidance (I could find at short notice) seems to really push that only the MAJOR version is important - https://unix.stackexchange.com/questions/475/how-do-so-shared-object-numbers-work


Found libcrypto as a counter-example to the convention I inferred. This links to a major.minor version, the major alone is actually not provided.

In [50]: l = ctypes.cdll.LoadLibrary("libcrypto.so.1.0.0")                                                                    

In [51]: l = ctypes.cdll.LoadLibrary("libcrypto.so.1.1")                                                                      

In [52]: l = ctypes.cdll.LoadLibrary("libcrypto.so.1")                                                                        
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-52-3d67cbbd3826> in <module>
----> 1 l = ctypes.cdll.LoadLibrary("libcrypto.so.1")

But this is exactly what I'd expect from the listing in /lib/:

[stiege@archie tensorflow]$ find /lib/ -name libcrypto.so*
/lib/libcrypto.so
/lib/libcrypto.so.1.1
/lib/libcrypto.so.1.0.0

So my main question appears to be that even though /opt/cuda/lib64/libcublas.so.10.1 seems to be available and configured via the ldconfig system, why is it unavailable for import with python.


Weird

[stiege@archie tensorflow]$ sudo cp /opt/cuda/lib64/libcublas.so.10.1 /opt/cuda/lib64/libcublas.so.10.2
[stiege@archie tensorflow]$ ldconfig -v | grep libcublas
ldconfig: Can't unlink /opt/cuda/lib64/libcublas.so.10
    libcublas.so.10 -> libcublas.so.10.2 (SKIPPED)
    libcublasLt.so.10 -> libcublasLt.so.10.1.0.105
[stiege@archie tensorflow]$ sudo cp /opt/cuda/lib64/libcublas.so.10.1 /opt/cuda/lib64/libcublas.so.11
[stiege@archie tensorflow]$ sudo cp /opt/cuda/lib64/libcublas.so.10.1 /opt/cuda/lib64/libcublas.so.11.2
[stiege@archie tensorflow]$ ldconfig -v | grep libcublas
ldconfig: Can't unlink /opt/cuda/lib64/libcublas.so.10
    libcublas.so.10 -> libcublas.so.11.2 (SKIPPED)
    libcublasLt.so.10 -> libcublasLt.so.10.1.0.105

I was expecting a new key "libcublas.so.11" to be created, but instead ldconfig seems to be trying to link 10 to 11.2 - no idea how this works.

mandeezh commented 5 years ago

I had the same problem, after remove tensorflow 1.13, install 1.12, problem was solved!

pip install tensorflow-gpu==1.12.0

my environment is nvidia-driver-390 cuda9.0