Closed jinzishuai closed 6 years ago
Ref: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/#axzz4VZnqTJ2A
Just follow the documentation
Essense: apt-get install cuda
ref: http://www.webupd8.org/2016/06/how-to-install-latest-nvidia-drivers-in.html
add-apt-repository ppa:graphics-drivers/ppa
apt install nvidia-381
seki@Ubuntu-Shi-Dell-Precision-M3800:~$ lspci -k|grep -i nvidia
02:00.0 3D controller: NVIDIA Corporation GK107GLM [Quadro K1100M] (rev a1)
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_387_drm, nvidia_387
seki@Ubuntu-Shi-Dell-Precision-M3800:~$ lsmod|grep nvidianvidia_drm 45056 0
nvidia_modeset 897024 1 nvidia_drm
nvidia 13815808 1 nvidia_modeset
drm_kms_helper 151552 3 nouveau,i915,nvidia_drm
drm 352256 6 nouveau,i915,ttm,nvidia_drm,drm_kms_helper
seki@Ubuntu-Shi-Dell-Precision-M3800:~$ cat /proc/driver/nvidia/gpus/0000\:02\:00.0/information
Model: Quadro K1100M
IRQ: 16
GPU UUID: GPU-????????-????-????-????-????????????
Video BIOS: ??.??.??.??.??
Bus Type: PCIe
DMA Size: 36 bits
DMA Mask: 0xfffffffff
Bus Location: 0000:02:00.0
Device Minor: 0
seki@Ubuntu-Shi-Dell-Precision-M3800:~$
http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#post-installation-actions
cuda-install-samples-9.0.sh <dir>
make
under /mnt/ShiJin/src/cuda-samples/NVIDIA_CUDA-9.0_Samples
and it builds binaries inside /mnt/ShiJin/src/cuda-samples/NVIDIA_CUDA-9.0_Samples/bin/x86_64/linux/release
make
under the 1_Utilities/deviceQuery
folder to generate the only binary we need./deviceQuery
from either the bin/x86_64/linux/release
folder or the source folder of 1_Utilities/deviceQuery
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuda-samples/NVIDIA_CUDA-9.0_Samples/1_Utilities/deviceQuery$ ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL
seki@Ubuntu-Shi-Dell-Precision-M3800:/usr/local/cuda-9.0/bin$ sudo apt install cuda
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
iucode-tool linux-headers-4.4.0-87
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
cuda-9-0 cuda-demo-suite-9-0 cuda-drivers cuda-runtime-9-0 libcuda1-384 nvidia-384 nvidia-384-dev nvidia-modprobe
nvidia-opencl-icd-384
The following packages will be REMOVED:
libcuda1-387 nvidia-387 nvidia-opencl-icd-387
The following NEW packages will be installed:
cuda cuda-9-0 cuda-demo-suite-9-0 cuda-drivers cuda-runtime-9-0 libcuda1-384 nvidia-384 nvidia-384-dev nvidia-modprobe
nvidia-opencl-icd-384
0 upgraded, 10 newly installed, 3 to remove and 311 not upgraded.
Need to get 80.6 MB/84.5 MB of archives.
After this operation, 353 kB disk space will be freed.
Do you want to continue? [Y/n] Y
It seems that CUDA wants to work with nvidia-384 only, but not nvidia-387 as installed already.
But after nvidia-384 is used, the same problem persisted.
https://github.com/NVIDIA/nvidia-docker/wiki/CUDA#requirements
seki@Ubuntu-Shi-Dell-Precision-M3800:~/src/cuda-samples/NVIDIA_CUDA-9.0_Samples/1_Utilities/deviceQuery$ sudo update-alternatives --config x86_64-linux-gnu_gl_conf
[sudo] password for seki:
There are 3 choices for the alternative x86_64-linux-gnu_gl_conf (providing /etc/ld.so.conf.d/x86_64-linux-gnu_GL.conf).
Selection Path Priority Status
------------------------------------------------------------
0 /usr/lib/nvidia-384/ld.so.conf 8604 auto mode
1 /usr/lib/nvidia-384-prime/ld.so.conf 8603 manual mode
2 /usr/lib/nvidia-384/ld.so.conf 8604 manual mode
* 3 /usr/lib/x86_64-linux-gnu/mesa/ld.so.conf 500 manual mode
Press <enter> to keep the current choice[*], or type selection number: 0
update-alternatives: using /usr/lib/nvidia-384/ld.so.conf to provide /etc/ld.so.conf.d/x86_64-linux-gnu_GL.conf (x86_64-linux-gnu_gl_conf) in auto mode
seki@Ubuntu-Shi-Dell-Precision-M3800:~/src/cuda-samples/NVIDIA_CUDA-9.0_Samples/1_Utilities/deviceQuery$ ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "Quadro K1100M"
CUDA Driver Version / Runtime Version 9.0 / 9.0
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 2002 MBytes (2098724864 bytes)
( 2) Multiprocessors, (192) CUDA Cores/MP: 384 CUDA Cores
GPU Max Clock rate: 706 MHz (0.71 GHz)
Memory Clock rate: 1400 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 262144 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 2 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS
seki@Ubuntu-Shi-Dell-Precision-M3800:~/src/cuda-samples/NVIDIA_CUDA-9.0_Samples/1_Utilities/deviceQuery$
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64:/usr/lib/nvidia-384/\
${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export PATH=/usr/local/cuda/bin:/usr/lib/nvidia-384/bin:$PATH
seki@Ubuntu-Shi-Dell-Precision-M3800:~/src/cuda-samples/NVIDIA_CUDA-9.0_Samples/1_Utilities/bandwidthTest$ ./bandwidthTest
[CUDA Bandwidth Test] - Starting...
Running on...
Device 0: Quadro K1100M
Quick Mode
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 9731.7
Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 9711.7
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 27980.0
Result = PASS
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
seki@Ubuntu-Shi-Dell-Precision-M3800:~/src/cuda-samples/NVIDIA_CUDA-9.0_Samples/1_Utilities/bandwidthTest$
/usr/lib/x86_64-linux-gnu/mesa/ld.so.conf
root@Ubuntu-Shi-Dell-Precision-M3800:~# update-alternatives --config x86_64-linux-gnu_gl_conf
There are 3 choices for the alternative x86_64-linux-gnu_gl_conf (providing /etc/ld.so.conf.d/x86_64-linux-gnu_GL.conf).
Selection Path Priority Status
------------------------------------------------------------
0 /usr/lib/nvidia-384/ld.so.conf 8604 auto mode
1 /usr/lib/nvidia-384-prime/ld.so.conf 8603 manual mode
2 /usr/lib/nvidia-384/ld.so.conf 8604 manual mode
* 3 /usr/lib/x86_64-linux-gnu/mesa/ld.so.conf 500 manual mode
At this state
nvidia-smi
runs properly
root@Ubuntu-Shi-Dell-Precision-M3800:~# nvidia-smi
Wed Nov 15 21:27:28 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.98 Driver Version: 384.98 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro K1100M Off | 00000000:02:00.0 Off | N/A |
| N/A 49C P0 N/A / N/A | 0MiB / 2001MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ root@Ubuntu-Shi-Dell-Precision-M3800:~#
* **sample code fails**
seki@Ubuntu-Shi-Dell-Precision-M3800:~/src/cuda-samples/NVIDIA_CUDA-9.0_Samples/1_Utilities/bandwidthTest$ ./bandwidthTest [CUDA Bandwidth Test] - Starting... Running on...
cudaGetDeviceProperties returned 30 -> unknown error CUDA error at bandwidthTest.cu:242 code=30(cudaErrorUnknown) "cudaSetDevice(currentDevice)"
/usr/lib/x86_64-linux-gnu/mesa/ld.so.conf
to load nvidia driver/usr/lib/nvidia-384/ld.so.conf
to use it properlyThis way
sudo dpkg -i libcudnn7_7.0.3.11-1+cuda9.0_amd64.deb
sudo dpkg -i libcudnn7-dev_7.0.3.11-1+cuda9.0_amd64.deb
sudo dpkg -i libcudnn7-doc_7.0.3.11-1+cuda9.0_amd64.deb
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN$ cp -r /usr/src/cudnn_samples_v7/
conv_sample/ mnistCUDNN/ RNN/
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN$ cp -r /usr/src/cudnn_samples_v7/ .
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN$ ls
cudnn_samples_v7 libcudnn7-dev_7.0.4.31-1+cuda9.0_amd64.deb
libcudnn7_7.0.4.31-1+cuda9.0_amd64.deb libcudnn7-doc_7.0.4.31-1+cuda9.0_amd64.deb
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN$ cd cudnn_samples_v7/
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN/cudnn_samples_v7$ ls
conv_sample mnistCUDNN RNN
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN/cudnn_samples_v7$ cd mnistCUDNN/
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN/cudnn_samples_v7/mnistCUDNN$ ls
data error_util.h fp16_dev.cu fp16_dev.h fp16_emu.cpp fp16_emu.h FreeImage gemv.h Makefile mnistCUDNN.cpp readme.txt
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN/cudnn_samples_v7/mnistCUDNN$ make clean && make
rm -rf *o
rm -rf mnistCUDNN
/usr/local/cuda/bin/nvcc -ccbin g++ -I/usr/local/cuda/include -IFreeImage/include -m64 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o fp16_dev.o -c fp16_dev.cu
g++ -I/usr/local/cuda/include -IFreeImage/include -o fp16_emu.o -c fp16_emu.cpp
g++ -I/usr/local/cuda/include -IFreeImage/include -o mnistCUDNN.o -c mnistCUDNN.cpp
/usr/local/cuda/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_53,code=compute_53 -o mnistCUDNN fp16_dev.o fp16_emu.o mnistCUDNN.o -LFreeImage/lib/linux/x86_64 -LFreeImage/lib/linux -lcudart -lcublas -lcudnn -lfreeimage -lstdc++ -lm
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN/cudnn_samples_v7/mnistCUDNN$ ls
data fp16_dev.cu fp16_dev.o fp16_emu.h FreeImage Makefile mnistCUDNN.cpp readme.txt
error_util.h fp16_dev.h fp16_emu.cpp fp16_emu.o gemv.h mnistCUDNN mnistCUDNN.o
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN/cudnn_samples_v7/mnistCUDNN$ ./mnistCUDNN
cudnnGetVersion() : 7004 , CUDNN_VERSION from cudnn.h : 7004 (7.0.4)
Host compiler version : GCC 5.4.0
There are 1 CUDA capable devices on your machine :
device 0 : sms 2 Capabilities 3.0, SmClock 705.5 Mhz, MemSize (Mb) 2001, MemClock 1400.0 Mhz, Ecc=0, boardGroupID=0
Using device 0
Testing single precision
...
Test passed!
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN/cudnn_samples_v7/mnistCUDNN$
seki@Ubuntu-Shi-Dell-Precision-M3800:/mnt/ShiJin/src/cuDNN/cudnn_samples_v7/mnistCUDNN$ python3
Python 3.5.2 (default, Sep 14 2017, 22:51:06)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/usr/lib/python3.5/imp.py", line 242, in load_module
return load_dynamic(name, filename, file)
File "/usr/lib/python3.5/imp.py", line 342, in load_dynamic
return _load(spec)
ImportError: libcublas.so.8.0: cannot open shared object file: No such file or directory
seki@Ubuntu-Shi-Dell-Precision-M3800:~$ ls /usr/local/cuda/lib64/libcublas* -lh
-rw-r--r-- 1 root root 67M Sep 2 04:39 /usr/local/cuda/lib64/libcublas_device.a
lrwxrwxrwx 1 root root 16 Sep 2 04:40 /usr/local/cuda/lib64/libcublas.so -> libcublas.so.9.0
lrwxrwxrwx 1 root root 20 Sep 2 04:40 /usr/local/cuda/lib64/libcublas.so.9.0 -> libcublas.so.9.0.176
-rw-r--r-- 1 root root 51M Sep 2 04:39 /usr/local/cuda/lib64/libcublas.so.9.0.176
-rw-r--r-- 1 root root 57M Sep 2 04:39 /usr/local/cuda/lib64/libcublas_static.a
seki@Ubuntu-Shi-Dell-Precision-M3800:~$
Installed 1.5 and it works
Problem with cuda compute capability 3.0
The Linux tf-1.5 build does not support my GPU but the Windows build does.
https://www.tensorflow.org/install/install_linux
sudo apt-get install libcupti-dev
sudo apt-get install python3-pip python3-dev
sudo pip3 install tensorflow-gpu
Version Notes