jinzishuai commented 6 years ago

https://www.tensorflow.org/install/install_linux

[x] CUDA® Toolkit . For details, see NVIDIA's documentation. Ensure that you append the relevant Cuda pathnames to the LD_LIBRARY_PATH environment variable as described in the NVIDIA documentation.
[x] The NVIDIA drivers associated with CUDA Toolkit
[x] cuDNN. For details, see NVIDIA's documentation. Ensure that you create the CUDA_HOME environment variable as described in the NVIDIA documentation.
[x] GPU card with CUDA Compute Capability 3.0 or higher. See NVIDIA documentation for a list of supported GPU cards.
[x] The libcupti-dev library, which is the NVIDIA CUDA Profile Tools Interface. This library provides advanced profiling support. sudo apt-get install libcupti-dev
[x] Install Python3: sudo apt-get install python3-pip python3-dev
[x] Install Tensorflow with pip: sudo pip3 install tensorflow-gpu
[x] Confirm with https://github.com/jinzishuai/learn2deeplearn/wiki/Tensor-Flow-Installation-on-Ubuntu-14-16-without-GPU#python-3

Version Notes

As of Nov 15, 2017, the pip3 installed tensorflow-gpu (1.4.0) only supports CUDA-8 + cuDNN-6.
It is expected to support CUDA-9 + cuDNN-9 in 1.5 very soon (within this week)

jinzishuai commented 6 years ago

1. Install CUDA Toolbit 9.0

Ref: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/#axzz4VZnqTJ2A

Just follow the documentation Essense: apt-get install cuda

jinzishuai commented 6 years ago

2. Install NVIDIA Driver

ref: http://www.webupd8.org/2016/06/how-to-install-latest-nvidia-drivers-in.html

add repo: add-apt-repository ppa:graphics-drivers/ppa
install Additional Drivers in Ubuntu System Settings
(Alternative to GUI and prefered), this can be done via CLI like apt install nvidia-381
- Pay attention to the version of kernel where the modules are built for
- New Initramfs is generated
- Make sure we are booting from the proper kernel and initramfs

Confirmation

seki@Ubuntu-Shi-Dell-Precision-M3800:~$ lspci -k|grep -i nvidia
02:00.0 3D controller: NVIDIA Corporation GK107GLM [Quadro K1100M] (rev a1)
    Kernel driver in use: nvidia
    Kernel modules: nvidiafb, nouveau, nvidia_387_drm, nvidia_387
seki@Ubuntu-Shi-Dell-Precision-M3800:~$ lsmod|grep nvidianvidia_drm             45056  0
nvidia_modeset        897024  1 nvidia_drm
nvidia              13815808  1 nvidia_modeset
drm_kms_helper        151552  3 nouveau,i915,nvidia_drm
drm                   352256  6 nouveau,i915,ttm,nvidia_drm,drm_kms_helper
seki@Ubuntu-Shi-Dell-Precision-M3800:~$ cat /proc/driver/nvidia/gpus/0000\:02\:00.0/information 
Model:       Quadro K1100M
IRQ:         16
GPU UUID:    GPU-????????-????-????-????-????????????
Video BIOS:      ??.??.??.??.??
Bus Type:    PCIe
DMA Size:    36 bits
DMA Mask:    0xfffffffff
Bus Location:    0000:02:00.0
Device Minor:    0
seki@Ubuntu-Shi-Dell-Precision-M3800:~$

jinzishuai commented 6 years ago

Verify CUDA and NVIDIA Driver with CUDA Samples

http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#post-installation-actions

install the samples by running cuda-install-samples-9.0.sh <dir>
run make under /mnt/ShiJin/src/cuda-samples/NVIDIA_CUDA-9.0_Samples and it builds binaries inside /mnt/ShiJin/src/cuda-samples/NVIDIA_CUDA-9.0_Samples/bin/x86_64/linux/release
- Or we can simply run make under the 1_Utilities/deviceQuery folder to generate the only binary we need
Run ./deviceQuery from either the bin/x86_64/linux/release folder or the source folder of 1_Utilities/deviceQuery

Problem: CUDA driver version is insufficient for CUDA runtime version

seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuda-samples/NVIDIA_CUDA-9.0_Samples/1_Utilities/deviceQuery$ ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL

jinzishuai commented 6 years ago

Try to reinstall cuda: not helpful

seki@Ubuntu-Shi-Dell-Precision-M3800:/usr/local/cuda-9.0/bin$ sudo apt install cuda
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  iucode-tool linux-headers-4.4.0-87
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
  cuda-9-0 cuda-demo-suite-9-0 cuda-drivers cuda-runtime-9-0 libcuda1-384 nvidia-384 nvidia-384-dev nvidia-modprobe
  nvidia-opencl-icd-384
The following packages will be REMOVED:
  libcuda1-387 nvidia-387 nvidia-opencl-icd-387
The following NEW packages will be installed:
  cuda cuda-9-0 cuda-demo-suite-9-0 cuda-drivers cuda-runtime-9-0 libcuda1-384 nvidia-384 nvidia-384-dev nvidia-modprobe
  nvidia-opencl-icd-384
0 upgraded, 10 newly installed, 3 to remove and 311 not upgraded.
Need to get 80.6 MB/84.5 MB of archives.
After this operation, 353 kB disk space will be freed.
Do you want to continue? [Y/n] Y

It seems that CUDA wants to work with nvidia-384 only, but not nvidia-387 as installed already.

But after nvidia-384 is used, the same problem persisted.

jinzishuai commented 6 years ago

Check Version Support Matrix: CUDA-9.0 does work with nvidia-384

https://github.com/NVIDIA/nvidia-docker/wiki/CUDA#requirements

jinzishuai commented 6 years ago

Solution: Reconfigure the GL LD LIB PATH

seki@Ubuntu-Shi-Dell-Precision-M3800:~/src/cuda-samples/NVIDIA_CUDA-9.0_Samples/1_Utilities/deviceQuery$ sudo update-alternatives --config x86_64-linux-gnu_gl_conf
[sudo] password for seki: 
There are 3 choices for the alternative x86_64-linux-gnu_gl_conf (providing /etc/ld.so.conf.d/x86_64-linux-gnu_GL.conf).

  Selection    Path                                       Priority   Status
------------------------------------------------------------
  0            /usr/lib/nvidia-384/ld.so.conf              8604      auto mode
  1            /usr/lib/nvidia-384-prime/ld.so.conf        8603      manual mode
  2            /usr/lib/nvidia-384/ld.so.conf              8604      manual mode
* 3            /usr/lib/x86_64-linux-gnu/mesa/ld.so.conf   500       manual mode

Press <enter> to keep the current choice[*], or type selection number: 0
update-alternatives: using /usr/lib/nvidia-384/ld.so.conf to provide /etc/ld.so.conf.d/x86_64-linux-gnu_GL.conf (x86_64-linux-gnu_gl_conf) in auto mode

Working Example

seki@Ubuntu-Shi-Dell-Precision-M3800:~/src/cuda-samples/NVIDIA_CUDA-9.0_Samples/1_Utilities/deviceQuery$ ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Quadro K1100M"
  CUDA Driver Version / Runtime Version          9.0 / 9.0
  CUDA Capability Major/Minor version number:    3.0
  Total amount of global memory:                 2002 MBytes (2098724864 bytes)
  ( 2) Multiprocessors, (192) CUDA Cores/MP:     384 CUDA Cores
  GPU Max Clock rate:                            706 MHz (0.71 GHz)
  Memory Clock rate:                             1400 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 262144 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 2 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS
seki@Ubuntu-Shi-Dell-Precision-M3800:~/src/cuda-samples/NVIDIA_CUDA-9.0_Samples/1_Utilities/deviceQuery$

jinzishuai commented 6 years ago

Also note my ~/.bashrc

export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64:/usr/lib/nvidia-384/\
                         ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export PATH=/usr/local/cuda/bin:/usr/lib/nvidia-384/bin:$PATH

jinzishuai commented 6 years ago

bandwidth test

seki@Ubuntu-Shi-Dell-Precision-M3800:~/src/cuda-samples/NVIDIA_CUDA-9.0_Samples/1_Utilities/bandwidthTest$ ./bandwidthTest 
[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: Quadro K1100M
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(MB/s)
   33554432         9731.7

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(MB/s)
   33554432         9711.7

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)    Bandwidth(MB/s)
   33554432         27980.0

Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
seki@Ubuntu-Shi-Dell-Precision-M3800:~/src/cuda-samples/NVIDIA_CUDA-9.0_Samples/1_Utilities/bandwidthTest$

jinzishuai commented 6 years ago

After Reboot, nvidia is not loading, have to change back to `/usr/lib/x86_64-linux-gnu/mesa/ld.so.conf`

root@Ubuntu-Shi-Dell-Precision-M3800:~# update-alternatives  --config x86_64-linux-gnu_gl_conf 
There are 3 choices for the alternative x86_64-linux-gnu_gl_conf (providing /etc/ld.so.conf.d/x86_64-linux-gnu_GL.conf).

  Selection    Path                                       Priority   Status
------------------------------------------------------------
  0            /usr/lib/nvidia-384/ld.so.conf              8604      auto mode
  1            /usr/lib/nvidia-384-prime/ld.so.conf        8603      manual mode
  2            /usr/lib/nvidia-384/ld.so.conf              8604      manual mode
* 3            /usr/lib/x86_64-linux-gnu/mesa/ld.so.conf   500       manual mode

At this state

unable to login to local machine with X
nvidia driver is loaded

nvidia-smi runs properly


root@Ubuntu-Shi-Dell-Precision-M3800:~# nvidia-smi 
Wed Nov 15 21:27:28 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.98                 Driver Version: 384.98                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K1100M       Off  | 00000000:02:00.0 Off |                  N/A |
| N/A   49C    P0    N/A /  N/A |      0MiB /  2001MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ root@Ubuntu-Shi-Dell-Precision-M3800:~#

* **sample code fails**

seki@Ubuntu-Shi-Dell-Precision-M3800:~/src/cuda-samples/NVIDIA_CUDA-9.0_Samples/1_Utilities/bandwidthTest$ ./bandwidthTest [CUDA Bandwidth Test] - Starting... Running on...

cudaGetDeviceProperties returned 30 -> unknown error CUDA error at bandwidthTest.cu:242 code=30(cudaErrorUnknown) "cudaSetDevice(currentDevice)"

jinzishuai commented 6 years ago

Summary

boot with /usr/lib/x86_64-linux-gnu/mesa/ld.so.conf to load nvidia driver
once booted, switch to /usr/lib/nvidia-384/ld.so.conf to use it properly

This way

X works
sample code runs

jinzishuai commented 6 years ago

3. Install cuDNN 7

register and download from https://developer.nvidia.com/cudnn
Navigate to your directory containing cuDNN Debian file.
Install the runtime library, for example:

sudo dpkg -i libcudnn7_7.0.3.11-1+cuda9.0_amd64.deb
Install the developer library, for example:

sudo dpkg -i libcudnn7-dev_7.0.3.11-1+cuda9.0_amd64.deb
Install the code samples and the cuDNN Library User Guide, for example:

sudo dpkg -i libcudnn7-doc_7.0.3.11-1+cuda9.0_amd64.deb

jinzishuai commented 6 years ago

cuDNN-Installation-Guide.pdf

jinzishuai commented 6 years ago

Verify cuDNN

seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN$ cp -r /usr/src/cudnn_samples_v7/
conv_sample/ mnistCUDNN/  RNN/         
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN$ cp -r /usr/src/cudnn_samples_v7/ .
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN$ ls
cudnn_samples_v7                        libcudnn7-dev_7.0.4.31-1+cuda9.0_amd64.deb
libcudnn7_7.0.4.31-1+cuda9.0_amd64.deb  libcudnn7-doc_7.0.4.31-1+cuda9.0_amd64.deb
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN$ cd cudnn_samples_v7/
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN/cudnn_samples_v7$ ls
conv_sample  mnistCUDNN  RNN
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN/cudnn_samples_v7$ cd mnistCUDNN/
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN/cudnn_samples_v7/mnistCUDNN$ ls
data  error_util.h  fp16_dev.cu  fp16_dev.h  fp16_emu.cpp  fp16_emu.h  FreeImage  gemv.h  Makefile  mnistCUDNN.cpp  readme.txt
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN/cudnn_samples_v7/mnistCUDNN$ make clean && make
rm -rf *o
rm -rf mnistCUDNN
/usr/local/cuda/bin/nvcc -ccbin g++ -I/usr/local/cuda/include -IFreeImage/include  -m64    -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o fp16_dev.o -c fp16_dev.cu
g++ -I/usr/local/cuda/include -IFreeImage/include   -o fp16_emu.o -c fp16_emu.cpp
g++ -I/usr/local/cuda/include -IFreeImage/include   -o mnistCUDNN.o -c mnistCUDNN.cpp
/usr/local/cuda/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o mnistCUDNN fp16_dev.o fp16_emu.o mnistCUDNN.o  -LFreeImage/lib/linux/x86_64 -LFreeImage/lib/linux -lcudart -lcublas -lcudnn -lfreeimage -lstdc++ -lm
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN/cudnn_samples_v7/mnistCUDNN$ ls
data          fp16_dev.cu  fp16_dev.o    fp16_emu.h  FreeImage  Makefile    mnistCUDNN.cpp  readme.txt
error_util.h  fp16_dev.h   fp16_emu.cpp  fp16_emu.o  gemv.h     mnistCUDNN  mnistCUDNN.o
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN/cudnn_samples_v7/mnistCUDNN$ ./mnistCUDNN 
cudnnGetVersion() : 7004 , CUDNN_VERSION from cudnn.h : 7004 (7.0.4)
Host compiler version : GCC 5.4.0
There are 1 CUDA capable devices on your machine :
device 0 : sms  2  Capabilities 3.0, SmClock 705.5 Mhz, MemSize (Mb) 2001, MemClock 1400.0 Mhz, Ecc=0, boardGroupID=0
Using device 0

Testing single precision
...
Test passed!
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN/cudnn_samples_v7/mnistCUDNN$

jinzishuai commented 6 years ago

After tensorflow-gpu installation, verification failed

seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN/cudnn_samples_v7/mnistCUDNN$ python3
Python 3.5.2 (default, Sep 14 2017, 22:51:06) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/usr/lib/python3.5/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/lib/python3.5/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: libcublas.so.8.0: cannot open shared object file: No such file or directory

It is looking for CUDA-8 while we have CUDA-9 installed

seki@Ubuntu-Shi-Dell-Precision-M3800:~$ ls /usr/local/cuda/lib64/libcublas* -lh
-rw-r--r-- 1 root root 67M Sep  2 04:39 /usr/local/cuda/lib64/libcublas_device.a
lrwxrwxrwx 1 root root  16 Sep  2 04:40 /usr/local/cuda/lib64/libcublas.so -> libcublas.so.9.0
lrwxrwxrwx 1 root root  20 Sep  2 04:40 /usr/local/cuda/lib64/libcublas.so.9.0 -> libcublas.so.9.0.176
-rw-r--r-- 1 root root 51M Sep  2 04:39 /usr/local/cuda/lib64/libcublas.so.9.0.176
-rw-r--r-- 1 root root 57M Sep  2 04:39 /usr/local/cuda/lib64/libcublas_static.a
seki@Ubuntu-Shi-Dell-Precision-M3800:~$

jinzishuai commented 6 years ago

after to-1.5 is released

Installed 1.5 and it works

performance test as in #31

Problem with cuda compute capability 3.0

jinzishuai commented 6 years ago

e402474e-0a81-4f14-8a48-c0a9968763ee

jinzishuai commented 6 years ago

Conclusion

The Linux tf-1.5 build does not support my GPU but the Windows build does.

jinzishuai / learn2deeplearn

Install TensorFlow on Ubuntu-16.04 with GPU Acceleration #36

Version Notes

1. Install CUDA Toolbit 9.0

2. Install NVIDIA Driver

Confirmation

Verify CUDA and NVIDIA Driver with CUDA Samples

Problem: CUDA driver version is insufficient for CUDA runtime version

Try to reinstall cuda: not helpful

Check Version Support Matrix: CUDA-9.0 does work with nvidia-384

Solution: Reconfigure the GL LD LIB PATH

Working Example

Also note my ~/.bashrc

bandwidth test

After Reboot, nvidia is not loading, have to change back to `/usr/lib/x86_64-linux-gnu/mesa/ld.so.conf`

Summary

3. Install cuDNN 7

Verify cuDNN

After tensorflow-gpu installation, verification failed

It is looking for CUDA-8 while we have CUDA-9 installed

after to-1.5 is released

performance test as in #31

Conclusion

jinzishuai / learn2deeplearn

Install TensorFlow on Ubuntu-16.04 with GPU Acceleration #36

Version Notes

1. Install CUDA Toolbit 9.0

2. Install NVIDIA Driver

Confirmation

Verify CUDA and NVIDIA Driver with CUDA Samples

Problem: CUDA driver version is insufficient for CUDA runtime version

Try to reinstall cuda: not helpful

Check Version Support Matrix: CUDA-9.0 does work with nvidia-384

Solution: Reconfigure the GL LD LIB PATH

Working Example

Also note my ~/.bashrc

bandwidth test

After Reboot, nvidia is not loading, have to change back to /usr/lib/x86_64-linux-gnu/mesa/ld.so.conf

Summary

3. Install cuDNN 7

Verify cuDNN

After tensorflow-gpu installation, verification failed

It is looking for CUDA-8 while we have CUDA-9 installed

after to-1.5 is released

performance test as in #31

Conclusion

After Reboot, nvidia is not loading, have to change back to `/usr/lib/x86_64-linux-gnu/mesa/ld.so.conf`