jinzishuai / learn2deeplearn

A repository of codes in learning deep learning
GNU General Public License v3.0
13 stars 1 forks source link

Install TensorFlow on Ubuntu-16.04 with GPU Acceleration #36

Closed jinzishuai closed 6 years ago

jinzishuai commented 6 years ago

https://www.tensorflow.org/install/install_linux

Version Notes

jinzishuai commented 6 years ago

1. Install CUDA Toolbit 9.0

Ref: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/#axzz4VZnqTJ2A

Just follow the documentation Essense: apt-get install cuda

jinzishuai commented 6 years ago

2. Install NVIDIA Driver

ref: http://www.webupd8.org/2016/06/how-to-install-latest-nvidia-drivers-in.html

Confirmation

seki@Ubuntu-Shi-Dell-Precision-M3800:~$ lspci -k|grep -i nvidia
02:00.0 3D controller: NVIDIA Corporation GK107GLM [Quadro K1100M] (rev a1)
    Kernel driver in use: nvidia
    Kernel modules: nvidiafb, nouveau, nvidia_387_drm, nvidia_387
seki@Ubuntu-Shi-Dell-Precision-M3800:~$ lsmod|grep nvidianvidia_drm             45056  0
nvidia_modeset        897024  1 nvidia_drm
nvidia              13815808  1 nvidia_modeset
drm_kms_helper        151552  3 nouveau,i915,nvidia_drm
drm                   352256  6 nouveau,i915,ttm,nvidia_drm,drm_kms_helper
seki@Ubuntu-Shi-Dell-Precision-M3800:~$ cat /proc/driver/nvidia/gpus/0000\:02\:00.0/information 
Model:       Quadro K1100M
IRQ:         16
GPU UUID:    GPU-????????-????-????-????-????????????
Video BIOS:      ??.??.??.??.??
Bus Type:    PCIe
DMA Size:    36 bits
DMA Mask:    0xfffffffff
Bus Location:    0000:02:00.0
Device Minor:    0
seki@Ubuntu-Shi-Dell-Precision-M3800:~$ 
jinzishuai commented 6 years ago

Verify CUDA and NVIDIA Driver with CUDA Samples

http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#post-installation-actions

Problem: CUDA driver version is insufficient for CUDA runtime version

seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuda-samples/NVIDIA_CUDA-9.0_Samples/1_Utilities/deviceQuery$ ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL
jinzishuai commented 6 years ago

Try to reinstall cuda: not helpful

seki@Ubuntu-Shi-Dell-Precision-M3800:/usr/local/cuda-9.0/bin$ sudo apt install cuda
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  iucode-tool linux-headers-4.4.0-87
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
  cuda-9-0 cuda-demo-suite-9-0 cuda-drivers cuda-runtime-9-0 libcuda1-384 nvidia-384 nvidia-384-dev nvidia-modprobe
  nvidia-opencl-icd-384
The following packages will be REMOVED:
  libcuda1-387 nvidia-387 nvidia-opencl-icd-387
The following NEW packages will be installed:
  cuda cuda-9-0 cuda-demo-suite-9-0 cuda-drivers cuda-runtime-9-0 libcuda1-384 nvidia-384 nvidia-384-dev nvidia-modprobe
  nvidia-opencl-icd-384
0 upgraded, 10 newly installed, 3 to remove and 311 not upgraded.
Need to get 80.6 MB/84.5 MB of archives.
After this operation, 353 kB disk space will be freed.
Do you want to continue? [Y/n] Y

It seems that CUDA wants to work with nvidia-384 only, but not nvidia-387 as installed already.

But after nvidia-384 is used, the same problem persisted.

jinzishuai commented 6 years ago

Check Version Support Matrix: CUDA-9.0 does work with nvidia-384

https://github.com/NVIDIA/nvidia-docker/wiki/CUDA#requirements

jinzishuai commented 6 years ago

Solution: Reconfigure the GL LD LIB PATH

seki@Ubuntu-Shi-Dell-Precision-M3800:~/src/cuda-samples/NVIDIA_CUDA-9.0_Samples/1_Utilities/deviceQuery$ sudo update-alternatives --config x86_64-linux-gnu_gl_conf
[sudo] password for seki: 
There are 3 choices for the alternative x86_64-linux-gnu_gl_conf (providing /etc/ld.so.conf.d/x86_64-linux-gnu_GL.conf).

  Selection    Path                                       Priority   Status
------------------------------------------------------------
  0            /usr/lib/nvidia-384/ld.so.conf              8604      auto mode
  1            /usr/lib/nvidia-384-prime/ld.so.conf        8603      manual mode
  2            /usr/lib/nvidia-384/ld.so.conf              8604      manual mode
* 3            /usr/lib/x86_64-linux-gnu/mesa/ld.so.conf   500       manual mode

Press <enter> to keep the current choice[*], or type selection number: 0
update-alternatives: using /usr/lib/nvidia-384/ld.so.conf to provide /etc/ld.so.conf.d/x86_64-linux-gnu_GL.conf (x86_64-linux-gnu_gl_conf) in auto mode

Working Example

seki@Ubuntu-Shi-Dell-Precision-M3800:~/src/cuda-samples/NVIDIA_CUDA-9.0_Samples/1_Utilities/deviceQuery$ ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Quadro K1100M"
  CUDA Driver Version / Runtime Version          9.0 / 9.0
  CUDA Capability Major/Minor version number:    3.0
  Total amount of global memory:                 2002 MBytes (2098724864 bytes)
  ( 2) Multiprocessors, (192) CUDA Cores/MP:     384 CUDA Cores
  GPU Max Clock rate:                            706 MHz (0.71 GHz)
  Memory Clock rate:                             1400 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 262144 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 2 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS
seki@Ubuntu-Shi-Dell-Precision-M3800:~/src/cuda-samples/NVIDIA_CUDA-9.0_Samples/1_Utilities/deviceQuery$ 
jinzishuai commented 6 years ago

Also note my ~/.bashrc

export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64:/usr/lib/nvidia-384/\
                         ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export PATH=/usr/local/cuda/bin:/usr/lib/nvidia-384/bin:$PATH
jinzishuai commented 6 years ago

bandwidth test

seki@Ubuntu-Shi-Dell-Precision-M3800:~/src/cuda-samples/NVIDIA_CUDA-9.0_Samples/1_Utilities/bandwidthTest$ ./bandwidthTest 
[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: Quadro K1100M
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(MB/s)
   33554432         9731.7

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(MB/s)
   33554432         9711.7

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(MB/s)
   33554432         27980.0

Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
seki@Ubuntu-Shi-Dell-Precision-M3800:~/src/cuda-samples/NVIDIA_CUDA-9.0_Samples/1_Utilities/bandwidthTest$ 
jinzishuai commented 6 years ago

After Reboot, nvidia is not loading, have to change back to /usr/lib/x86_64-linux-gnu/mesa/ld.so.conf

root@Ubuntu-Shi-Dell-Precision-M3800:~# update-alternatives  --config x86_64-linux-gnu_gl_conf 
There are 3 choices for the alternative x86_64-linux-gnu_gl_conf (providing /etc/ld.so.conf.d/x86_64-linux-gnu_GL.conf).

  Selection    Path                                       Priority   Status
------------------------------------------------------------
  0            /usr/lib/nvidia-384/ld.so.conf              8604      auto mode
  1            /usr/lib/nvidia-384-prime/ld.so.conf        8603      manual mode
  2            /usr/lib/nvidia-384/ld.so.conf              8604      manual mode
* 3            /usr/lib/x86_64-linux-gnu/mesa/ld.so.conf   500       manual mode

At this state

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ root@Ubuntu-Shi-Dell-Precision-M3800:~#

* **sample code fails**

seki@Ubuntu-Shi-Dell-Precision-M3800:~/src/cuda-samples/NVIDIA_CUDA-9.0_Samples/1_Utilities/bandwidthTest$ ./bandwidthTest [CUDA Bandwidth Test] - Starting... Running on...

cudaGetDeviceProperties returned 30 -> unknown error CUDA error at bandwidthTest.cu:242 code=30(cudaErrorUnknown) "cudaSetDevice(currentDevice)"

jinzishuai commented 6 years ago

Summary

This way

jinzishuai commented 6 years ago

3. Install cuDNN 7

  1. register and download from https://developer.nvidia.com/cudnn
  2. Navigate to your directory containing cuDNN Debian file.
  3. Install the runtime library, for example:

    sudo dpkg -i libcudnn7_7.0.3.11-1+cuda9.0_amd64.deb

  4. Install the developer library, for example:

    sudo dpkg -i libcudnn7-dev_7.0.3.11-1+cuda9.0_amd64.deb

  5. Install the code samples and the cuDNN Library User Guide, for example:

    sudo dpkg -i libcudnn7-doc_7.0.3.11-1+cuda9.0_amd64.deb

jinzishuai commented 6 years ago

cuDNN-Installation-Guide.pdf

jinzishuai commented 6 years ago

Verify cuDNN

seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN$ cp -r /usr/src/cudnn_samples_v7/
conv_sample/ mnistCUDNN/  RNN/         
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN$ cp -r /usr/src/cudnn_samples_v7/ .
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN$ ls
cudnn_samples_v7                        libcudnn7-dev_7.0.4.31-1+cuda9.0_amd64.deb
libcudnn7_7.0.4.31-1+cuda9.0_amd64.deb  libcudnn7-doc_7.0.4.31-1+cuda9.0_amd64.deb
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN$ cd cudnn_samples_v7/
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN/cudnn_samples_v7$ ls
conv_sample  mnistCUDNN  RNN
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN/cudnn_samples_v7$ cd mnistCUDNN/
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN/cudnn_samples_v7/mnistCUDNN$ ls
data  error_util.h  fp16_dev.cu  fp16_dev.h  fp16_emu.cpp  fp16_emu.h  FreeImage  gemv.h  Makefile  mnistCUDNN.cpp  readme.txt
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN/cudnn_samples_v7/mnistCUDNN$ make clean && make
rm -rf *o
rm -rf mnistCUDNN
/usr/local/cuda/bin/nvcc -ccbin g++ -I/usr/local/cuda/include -IFreeImage/include  -m64    -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o fp16_dev.o -c fp16_dev.cu
g++ -I/usr/local/cuda/include -IFreeImage/include   -o fp16_emu.o -c fp16_emu.cpp
g++ -I/usr/local/cuda/include -IFreeImage/include   -o mnistCUDNN.o -c mnistCUDNN.cpp
/usr/local/cuda/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o mnistCUDNN fp16_dev.o fp16_emu.o mnistCUDNN.o  -LFreeImage/lib/linux/x86_64 -LFreeImage/lib/linux -lcudart -lcublas -lcudnn -lfreeimage -lstdc++ -lm
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN/cudnn_samples_v7/mnistCUDNN$ ls
data          fp16_dev.cu  fp16_dev.o    fp16_emu.h  FreeImage  Makefile    mnistCUDNN.cpp  readme.txt
error_util.h  fp16_dev.h   fp16_emu.cpp  fp16_emu.o  gemv.h     mnistCUDNN  mnistCUDNN.o
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN/cudnn_samples_v7/mnistCUDNN$ ./mnistCUDNN 
cudnnGetVersion() : 7004 , CUDNN_VERSION from cudnn.h : 7004 (7.0.4)
Host compiler version : GCC 5.4.0
There are 1 CUDA capable devices on your machine :
device 0 : sms  2  Capabilities 3.0, SmClock 705.5 Mhz, MemSize (Mb) 2001, MemClock 1400.0 Mhz, Ecc=0, boardGroupID=0
Using device 0

Testing single precision
...
Test passed!
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN/cudnn_samples_v7/mnistCUDNN$
jinzishuai commented 6 years ago

After tensorflow-gpu installation, verification failed

seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN/cudnn_samples_v7/mnistCUDNN$ python3
Python 3.5.2 (default, Sep 14 2017, 22:51:06) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/usr/lib/python3.5/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/lib/python3.5/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: libcublas.so.8.0: cannot open shared object file: No such file or directory

It is looking for CUDA-8 while we have CUDA-9 installed

seki@Ubuntu-Shi-Dell-Precision-M3800:~$ ls /usr/local/cuda/lib64/libcublas* -lh
-rw-r--r-- 1 root root 67M Sep  2 04:39 /usr/local/cuda/lib64/libcublas_device.a
lrwxrwxrwx 1 root root  16 Sep  2 04:40 /usr/local/cuda/lib64/libcublas.so -> libcublas.so.9.0
lrwxrwxrwx 1 root root  20 Sep  2 04:40 /usr/local/cuda/lib64/libcublas.so.9.0 -> libcublas.so.9.0.176
-rw-r--r-- 1 root root 51M Sep  2 04:39 /usr/local/cuda/lib64/libcublas.so.9.0.176
-rw-r--r-- 1 root root 57M Sep  2 04:39 /usr/local/cuda/lib64/libcublas_static.a
seki@Ubuntu-Shi-Dell-Precision-M3800:~$ 
jinzishuai commented 6 years ago

after to-1.5 is released

Installed 1.5 and it works

performance test as in #31

Problem with cuda compute capability 3.0

jinzishuai commented 6 years ago

e402474e-0a81-4f14-8a48-c0a9968763ee

jinzishuai commented 6 years ago

Conclusion

The Linux tf-1.5 build does not support my GPU but the Windows build does.