Open cboettig opened 1 year ago
I'm busy in CUDA (keras/tensorflow) setup for aged NVIDIA Tesla K80 Azure Datacenter GPU (2014)
These GPU are rather old and near to be dismissed by Microsoft, but very cheap, useful for educational vm, used by students in our institution.
The NVIDIA kernel driver supported is branch 470:
in rocker/ml:4.2.3
container:
| NVIDIA-SMI 470.182.03 Driver Version: 470.182.03 CUDA Version: 11.8 |
from docs, 470 should be supported in all CUDA-11.x releases
But comparing repositories, the 470 (user-mode) driver is missing
cuda-compat-11-7_515.43.04-1_amd64.deb
cuda-compat-11-7_515.48.07-1_amd64.deb
cuda-compat-11-7_515.65.01-1_amd64.deb
cuda-compat-11-7_515.65.07-1_amd64.deb
cuda-compat-11-7_515.86.01-1_amd64.deb
cuda-compat-11-7_515.105.01-1_amd64.deb
cuda-compat-11-8_520.61.05-1_amd64.deb
cuda-compat-12-0_525.60.13-1_amd64.deb
cuda-compat-12-0_525.85.12-1_amd64.deb
cuda-compat-12-0_525.105.17-1_amd64.deb
cuda-compat-12-1_530.30.02-1_amd64.deb
cuda-compat-11-4_470.42.01-1_amd64.deb
cuda-compat-11-4_470.57.02-1_amd64.deb
cuda-compat-11-4_470.82.01-1_amd64.deb
cuda-compat-11-4_470.103.01-1_amd64.deb
cuda-compat-11-4_470.129.06-1_amd64.deb
cuda-compat-11-4_470.141.03-1_amd64.deb
cuda-compat-11-4_470.141.10-1_amd64.deb
cuda-compat-11-4_470.161.03-1_amd64.deb
cuda-compat-11-4_470.182.03-1_amd64.deb
cuda-compat-11-5_495.29.05-1_amd64.deb
cuda-compat-11-6_510.39.01-1_amd64.deb
cuda-compat-11-6_510.47.03-1_amd64.deb
cuda-compat-11-6_510.73.08-1_amd64.deb
cuda-compat-11-6_510.84-1_amd64.deb
cuda-compat-11-6_510.85.02-1_amd64.deb
cuda-compat-11-6_510.108.03-1_amd64.deb
cuda-compat-11-7_515.43.04-1_amd64.deb
cuda-compat-11-7_515.48.07-1_amd64.deb
cuda-compat-11-7_515.65.01-1_amd64.deb
cuda-compat-11-7_515.65.07-1_amd64.deb
cuda-compat-11-7_515.86.01-1_amd64.deb
cuda-compat-11-7_515.105.01-1_amd64.deb
cuda-compat-11-8_520.61.05-1_amd64.deb
cuda-compat-12-0_525.60.13-1_amd64.deb
cuda-compat-12-0_525.85.12-1_amd64.deb
cuda-compat-12-0_525.105.17-1_amd64.deb
cuda-compat-12-1_530.30.02-1_amd64.deb
I tried rocker/ml:4.2.3
container, based on cuda 11.8 (Ubuntu 22.04):
but it cannot work:
2023-04-14 18:04:34.360427: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: a5d8ae1ed794
2023-04-14 18:04:34.360439: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: a5d8ae1ed794
2023-04-14 18:04:34.360546: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 520.61.5
2023-04-14 18:04:34.360587: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 470.182.3
2023-04-14 18:04:34.360598: E tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:313] kernel version 470.182.3 does not match DSO version 520.61.5 -- cannot find working devices in this configuration
I also tried rocker/ml:4.2.1-cuda11.1
container, based on cuda 11.1 (Ubuntu 20.04):
but I found some problems:
Maybe a possible solution for cuda-11 support for 470 branch:
@hute37 Thanks for digging into this, as you see, we're still figuring out the best strategy for handling CUDA versioning in this stack.
I haven't had a chance to investigate here, this is a great start but we will need to dig a bit deeper still. As you know, there are at least three moving parts in the versioning scheme we need to triangulate:
obviously rocker can only directly select versions in the third category. My issue up top referenced the second of these, but I think the right solution there is to recommend the user update the host drivers, rather than attempting to support all drivers. But on to your issue: it would definitely be nice to retain support for older hardware. I'm a bit puzzled why the cuda11.1 setup is not viable here, but it may be due to how rocker/ml:4.2.1-cuda11.1 is built than due to CUDA? As you've noticed, that version and prior versions of rocker CUDA stack added CUDA libs on top of the r-ver base image using custom scripts based on nvidia's containers, while in the current cuda 11.8 script we instead use official NVIDIA cuda Ubuntu-based images as our base image. (This was because at the time, NVIDIA only provided ubuntu-18.04 base images).
So rather than proliferate too many tags, it might be better if we can see about patching the cuda11.1 image correctly for this? For the libnvinfer
issue, do you need libnvinfer8 ? IIRC, that release would have been aligned to libnvinfer7, no? Also not sure about the libdevice and CuDNN errors -- but not clear to me that the 11.1 vs 11.4 is really the source of the problem there?
Maybe the question is easy to formulate, but the answer is not ...
Q: "Which is the latest tensorflow version that can be used (by CRAN-keras) with obsolete (470-driver line) NVIDIA gpus ?"
Because of stack dependencies, the answer depends on several sub-questions:
470 kernel drivers (in host OS) require the same cuda-compat-470
user-driver in container
to support containers, also NVIDIA Container Toolkit is required
nvidia repository 470 driver availability seems to be limited to Ubuntu 20.04 only, with no support for 22.04
CUDA versions choice is very critical: only one of CUDA 11-x should be selected (10-x is rather old, 12-x is unsupported):
libnvinfer
package (strange ...)cuda-compat-11-4
user driver (required to match 470 kernel driver) latest CRAN keras package installs tensorflow version 2.11 by default
CuDNN >= 8.6 seems to be a prerequisite for tensorflow v 2.11
nvBLAS/cuBLAS (>= 11.*) integration with OpenBLAS
optional TensorRT support (?)
While NVIDIA declares full support for 470 line until CUDA-12-2, in NVIDIA repositories some combinations are not supported, in particular for obsolete hardware.
I'd prefer apt based installations, but maybe another installation method could fill the gaps?
I would like to avoid conda/miniconda stacks because I need to interoperate with projects based on pyenv
/poetry
python environments.
chatGPT wasn't helpful ...
Thanks, this is definitely helpful.
NVIDIA obviously isn't making it easy for us by insisting that
while at the same time insisting that
The first choice makes sense to me, in that it allows users to still run older software and newer software by staying current on their drivers.
The second choice seems unfortunate, and is basically saying that if you want to use old hardware you'll be stuck on old software too. (Obviously that's financially in the interest of a company selling new hardware and may contribute to Microsoft's choice here too).
So I think this also supports your formulation of the question: the only way forward on old hardware is to lock in an old version of all the software as well, including an old version of keras, tensorflow, and cuda toolkit. Does that sound accurate?
Ok so now for nuts and bolts. given the above, I think it won't be viable to look for a solution that takes the default tensorflow version from current CRAN version of keras as the constraint -- it's not clear from the above that tensorflow 2.11 was intended for a driver 470 / cuda 11.1 / ubuntu 20.04 environment?
I don't have a machine running the 470 drivers available, so I can't help much to check things here, but can you see about some earlier versions of tensorflow? (in particular I'm not clear on the history of the libnvinfer libs here, they may have been introduced only later?)
It worked! But in a "manual (hammered)" non-containerized setup ...
One point is compatibility between (host) kernel driver from 470 line and user-mode cuda-driver
(from cuda-compat-470
, prepended in LD_LIBRARY_PATH and found before "standard" 5xx cuda-driver).
For containers, NVIDIA container toolkit is also required.
With this setup, I could install CUDA-11-8 libraries. I successfully tested a simple array multiplication performed in GPU.
Another very important topic is cuDNN
support.
cuDNN
is the library where ML GPU magic happens. TF is linked very tight to this library.
cuDNN
is build around GPU Compute Capabilities, identified by a level code.
For instance, Tesla (Kepler) K80 has 3.7 capability level.
Your hardware supports a fixed level of compute capabilities, that fixes the maximum version number of cuDNN library, supporting that GPU, that, in turn, fixes the maximum version number of TensorFlow that can be used on that system
Some useful references:
A working configuration ...
» inxi -b
System:
Host: xt-si701-v01 Kernel: 5.15.0-1035-azure x86_64 bits: 64
Desktop: Xfce 4.16.0 Distro: Ubuntu 22.04.2 LTS (Jammy Jellyfish)
Machine:
Type: Desktop Mobo: Microsoft model: Virtual Machine v: 7.0
serial: <superuser required> BIOS: American Megatrends v: 090007
date: 06/02/2017
CPU:
Info: 6-core Intel Xeon E5-2690 v3 [MCP] speed (MHz): avg: 2597
Graphics:
Device-1: Microsoft Hyper-V virtual VGA driver: hyperv_drm v: kernel
Device-2: NVIDIA GK210GL [Tesla K80] driver: nvidia v: 470.182.03
Display: x11 server: X.Org v: 1.20.9 driver: X: loaded: N/A
unloaded: modesetting gpu: hyperv_drm note: X driver n/a
resolution: 1920x1080~60Hz
OpenGL: renderer: llvmpipe (LLVM 15.0.6 256 bits) v: 4.5 Mesa 22.2.5
» lspci | grep NVIDIA
0001:00:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
» nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.182.03 Driver Version: 470.182.03 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000001:00:00.0 Off | 0 |
| N/A 38C P0 81W / 149W | 0MiB / 11441MiB | 55% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
Drivers and Container Toolkit: Apt installed
» apt list --installed | grep -i -e nvidia -e cuda
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
cuda-keyring/unknown,now 1.0-1 all [installed]
libnvidia-cfg1-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
libnvidia-common-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 all [installed,automatic]
libnvidia-compute-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
libnvidia-container-tools/unknown,now 1.13.0-1 amd64 [installed,automatic]
libnvidia-container1/unknown,now 1.13.0-1 amd64 [installed,automatic]
libnvidia-decode-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
libnvidia-egl-wayland1/jammy,now 1:1.1.9-1.1 amd64 [installed,automatic]
libnvidia-encode-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
libnvidia-extra-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
libnvidia-fbc1-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
libnvidia-gl-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
libnvidia-ifr1-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
mate-sensors-applet-nvidia/jammy,now 1.26.0-1 amd64 [installed]
nvidia-compute-utils-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
nvidia-container-toolkit-base/unknown,now 1.13.0-1 amd64 [installed]
nvidia-container-toolkit/unknown,now 1.13.0-1 amd64 [installed]
nvidia-dkms-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
nvidia-driver-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed]
nvidia-kernel-common-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
nvidia-kernel-source-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
nvidia-prime/jammy,now 0.8.17.1 all [installed,automatic]
nvidia-settings/unknown,now 530.30.02-0ubuntu1 amd64 [installed,automatic]
nvidia-utils-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed]
xserver-xorg-video-nvidia-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
Manual Installation in /usr/local/cuda
I manually downloaded installation packages:
cuda_11.8.0_520.61.05_linux.run
cuda-compat-11-4_470.182.03-1_amd64.deb
cudnn-linux-x86_64-8.6.0.163_cuda11-archive.tar.xz
libnvinfer8_8.5.3-1+cuda11.8_amd64.deb
libnvinfer-dev_8.5.3-1+cuda11.8_amd64.deb
libnvinfer-plugin8_8.5.3-1+cuda11.8_amd64.deb
libnvinfer-plugin-dev_8.5.3-1+cuda11.8_amd64.deb
What | Version | Mode | Source |
---|---|---|---|
Kernel Driver | 470.182.03 | apt | nvidia-driver-470 |
CUDA Driver | 470.* | local | cuda-compat-11-4_470.182.03-* |
CUDA Libraries | 11-8 | local | cuda_11.8.0_520*_linux.run |
libnvinfer | 8.5.3-* | local | linnvinfer8_8.5.3 (+plugin8, + dev) |
cuDNN | 8.6.0.* | local | cudnn-linux-x86_64-8.6.0.163_cuda11-archive.tar.xz |
TensorFlow | 2.12.0 | PyPI | poetry |
Also required under /etc/ld.so.conf.d
$ cd /etc/ld.so.conf.d
$ /etc/ld.so.conf.d# ls -1 1*cuda*
114_cuda-11-compat.conf
118_cuda-11-local.conf
$ cat 1*cuda*
/usr/local/cuda/compat
#/usr/local/cuda-11.8/lib64
/usr/local/cuda/lib64
#/usr/local/cuda-11.8/lib64
# update cache
sudo ldconfig
Check Libraries:
$ ldconfig -p | grep -i -e 'libcuda.so' -e 'lib..blas.*.so' -e 'libcudnn.so' -e 'libnvinfer.*.so' -e 'libcudnn.*.so'
libnvinfer_plugin.so.8 (libc6,x86-64) => /usr/local/cuda/lib64/libnvinfer_plugin.so.8
libnvinfer_plugin.so (libc6,x86-64) => /usr/local/cuda/lib64/libnvinfer_plugin.so
libnvinfer.so.8 (libc6,x86-64) => /usr/local/cuda/lib64/libnvinfer.so.8
libnvinfer.so (libc6,x86-64) => /usr/local/cuda/lib64/libnvinfer.so
libnvblas.so.11 (libc6,x86-64) => /usr/local/cuda/lib64/libnvblas.so.11
libnvblas.so (libc6,x86-64) => /usr/local/cuda/lib64/libnvblas.so
libcudnn_ops_train.so.8 (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_ops_train.so.8
libcudnn_ops_train.so (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_ops_train.so
libcudnn_ops_infer.so.8 (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_ops_infer.so.8
libcudnn_ops_infer.so (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_ops_infer.so
libcudnn_cnn_train.so.8 (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_cnn_train.so.8
libcudnn_cnn_train.so (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_cnn_train.so
libcudnn_cnn_infer.so.8 (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_cnn_infer.so.8
libcudnn_cnn_infer.so (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_cnn_infer.so
libcudnn_adv_train.so.8 (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_adv_train.so.8
libcudnn_adv_train.so (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_adv_train.so
libcudnn_adv_infer.so.8 (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_adv_infer.so.8
libcudnn_adv_infer.so (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_adv_infer.so
libcudnn.so.8 (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn.so.8
libcudnn.so (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn.so
libcuda.so.1 (libc6,x86-64) => /usr/local/cuda/compat/libcuda.so.1
libcuda.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcuda.so.1
libcuda.so (libc6,x86-64) => /usr/local/cuda/compat/libcuda.so
libcuda.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libcuda.so
libcublasLt.so.11 (libc6,x86-64) => /usr/local/cuda/lib64/libcublasLt.so.11
libcublasLt.so (libc6,x86-64) => /usr/local/cuda/lib64/libcublasLt.so
libcublas.so.11 (libc6,x86-64) => /usr/local/cuda/lib64/libcublas.so.11
libcublas.so (libc6,x86-64) => /usr/local/cuda/lib64/libcublas.so
A simple (python) test script:
» python exec/dummy-gpu-tf.py
2023-04-19 18:32:53.603349: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Num GPUs Available: 1
GPUs: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
2023-04-19 18:33:01.566433: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10766 MB memory: -> device: 0, name: Tesla K80, pci bu
s id: 0001:00:00.0, compute capability: 3.7
...
Epoch 1/15
2023-04-19 18:33:03.653302: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:424] Loaded cuDNN version 8600
2023-04-19 18:33:04.129065: I tensorflow/compiler/xla/service/service.cc:169] XLA service 0x7f52e400cf70 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-04-19 18:33:04.129092: I tensorflow/compiler/xla/service/service.cc:177] StreamExecutor device (0): Tesla K80, Compute Capability 3.7
2023-04-19 18:33:04.135966: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2023-04-19 18:33:04.308010: I ./tensorflow/compiler/jit/device_compiler.h:180] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.
422/422 [==============================] - 6s 8ms/step - loss: 0.3709 - accuracy: 0.8885 - val_loss: 0.0797 - val_accuracy: 0.9798
...
Building a container is a different matter ...
Maybe a multistage container build could be used to grab cuda parts from different base images (?)
It worked!
Test based on Tensorflow guide: "Use a GPU"
Base Image | Version |
---|---|
ml | 4.2.3 |
ml-verse | 4.2.3 |
Virtual Machine | GPU Card | GPU Family | NVIDIA driver version | compute capabilities |
---|---|---|---|---|
Azure NC6 | K80 | Tesla/Kepler | 470.* | 3.7 |
Azure NV12 | M60 | Tesla/Maxwell | 470.* | 5.2 |
Component | Name | version | note |
---|---|---|---|
Operating System | Ubuntu | 18.04, 22.04 | |
Container Environ | Podman | 3.4.2 | rootless mode |
Container Runtime | crun | ||
Container Storage | fuse-overlay | under ubuntu 18.04, rootless mode requires fuse-overlay |
|
NVIDIA Kernel Driver | nvidia-driver-470 | 470.182.03 | dkms apt-get install from NVIDIA repo |
NVIDIA Compute Driver | libnvidia-compute-470 | 470.182.03 | cuda-11 apt-get install from NVIDIA repo |
NVIDIA Container Toolkit | nvidia-container-toolkit | 1.13.1-1 | cuda-11 apt-get install from NVIDIA repo |
Component | Name | version | note |
---|---|---|---|
Operating System | Ubuntu | 22.04 | |
Framework | CUDA-11 | 11.08 | |
"Compat" Libraries | - | - | removed from base image, in confict with host container toolkit |
CUDA Compiler | NVCC | - | installed required for bytecode generation |
ML DNN support | cuDNN | 8.6 | downgraded for GPU (3.7 level) compatibility support |
BLAS | nvblas/cublab | unsupported, this librarry cannot be enabled in build phase during install.r package setup |
Component | Name | version | note |
---|---|---|---|
Environment manager | pyenv | ||
Language Interpreter | python | 3.10.6 | pyenv install |
Package Manager | poetry | pyproject.toml definition |
|
ML Framework | Tensorflow | 2.11 | |
ML Modelling | Keras |
@hute37 hey well done, that's pretty cool! so it looks like dropping compat libraries and rolling cuDNN back to 8.6 was key? Nicely written install script, thanks for sharing!
@cboettig
cuDNN compute capabilities requirements match with your GPU is critical. NVIDIA declares support for 3.5+ level (and 470 driver) for all CUDA-11 releases, but later components seem to break support.
Also NVCC compiler is a requirement: I have read something about suppression of byte-code cache for some GPU, so compiler must be present to regenerate cache on the fly (with a noticeable startup delay?)
libcuda-compat
is a strange thing ...
Early cuda support in containers directly exported the GPU device, that was handled inside the container.
Recent model (NVIDIA container toolkit) keeps the image agnostic in terms of GPU model and availability (in a CI/CD pipeline, many systems share the same image with different GPU). It is critical that the host running the container "injects" the right driver (mount) at runtime. So far, so good ... The strange fact was that, if compat was present in container, internal driver (520) took precedence over injected one (470)
One thing is still missing: BLAS/LAPACK support, ...
I tried to patch R/Rscript renviron to enable LD_PRELOAD trick to link nvblas/cublas library in front of standard OpenBlas (what about MKL?).
The problem here is that GPU is not available during image build phase and all install.r
calls generated a lot of warnings.
Runtime inclusion of nvblas would be better but is rather complex ...
In terms of support lines, maybe a cuda-11
(470/520 driver compatible) base image line could be a nice addition, while the "latest and greatest" images could support cuda-12+ and driver 520 only
Runtime inclusion of nvblas would be better but is rather complex ...
To provide NVBLAS-enabled R
and Rscript
:
cp -a $(which R) $(which R)_
echo '#!/bin/bash' > $(which R)
echo "command -v nvidia-smi >/dev/null && nvidia-smi -L | grep 'GPU[[:space:]]\?[[:digit:]]\+' >/dev/null && export LD_PRELOAD=libnvblas.so" >> $(which R)
echo "$(which R_) \"\${@}\"" >> $(which R)
cp -a $(which Rscript) $(which Rscript)_
echo '#!/bin/bash' > $(which Rscript)
echo "command -v nvidia-smi >/dev/null && nvidia-smi -L | grep 'GPU[[:space:]]\?[[:digit:]]\+' >/dev/null && export LD_PRELOAD=libnvblas.so" >> $(which Rscript)
echo "$(which Rscript_) \"\${@}\"" >> $(which Rscript)
👉 Enabled at runtime and only if nvidia-smi
and at least one GPU are present.
(LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
)
Run some benchmarks to ensure that NVBLAS actually outperforms OpenBLAS.
ℹ️ In some of my setups it does not. That is why I decided to provide NVBLAS-enabled R_
and Rscript_
in addition to the default R
and Rscript
.
References:
I found an issue in BLAS configuration.
In base nvidia-cuda
image, OpenBLAS libraries, while installed, are disabled in /etc/alternatives
configuration.
In the image, with nvblas/cublas enabled, system BLAS config is reset to basic (slow) libraries, but /etc/nvblas.conf
wraps (fast) OpenBLAS version:
NVBLAS_CPU_BLAS_LIB /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
To reset BLAS configuration I had to include in my setup:
update-alternatives --auto libblas.so.3-x86_64-linux-gnu
update-alternatives --auto liblapack.so.3-x86_64-linux-gnu
To check, (from container bash):
Rscript -e 'sessionInfo()' | grep -e 'BLAS' -e 'LAPACK'
References:
@hute37 iirc, BLAS was intentionally turned off my default due to https://github.com/rocker-org/rocker-versioned2/issues/471 , which I believe was traced to an open issue with how either numpy
or the openblas
libraries handled suffixes on its symbols, see https://github.com/numpy/numpy/issues/21643 .
Although the numpy
issue thread is still open, I believe that issue impacted only older libraries on Ubuntu 20.04 openblas, and that the newer openblas on 22.04 was not impacted. @hute37 would you be able to quickly test that, e.g. the reprex in https://github.com/rstudio/reticulate/issues/1190 no longer segfaults when you enable openblas?
@eitsupi Do you think we could turn openblas config back on by default for 22.04 cuda
images while leaving it off for the 20.04 images?
@eitsupi Do you think we could turn openblas config back on by default for 22.04
cuda
images while leaving it off for the 20.04 images?
Sure. I think we just need to add Ubuntu 20.04 to the conditions in the following section.
This test runs without any errors under this configuration:
> sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.1 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Europe/Rome
tzcode source: system (glibc)
...
> system('update-alternatives --display libblas.so.3-x86_64-linux-gnu; update-alternatives --display liblapack.so.3-x86_64-linux-gnu')
libblas.so.3-x86_64-linux-gnu - auto mode
link best version is /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
link currently points to /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
link libblas.so.3-x86_64-linux-gnu is /usr/lib/x86_64-linux-gnu/libblas.so.3
/usr/lib/x86_64-linux-gnu/blas/libblas.so.3 - priority 10
/usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 - priority 100
liblapack.so.3-x86_64-linux-gnu - auto mode
link best version is /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
link currently points to /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
link liblapack.so.3-x86_64-linux-gnu is /usr/lib/x86_64-linux-gnu/liblapack.so.3
/usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3 - priority 10
/usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3 - priority 100
> system('apt list --installed | grep -i -e blas -e lapack ')
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
libblas-dev/now 3.10.0-2ubuntu1 amd64 [installed,local]
libblas3/now 3.10.0-2ubuntu1 amd64 [installed,local]
libcublas-11-8/now 11.11.3.6-1 amd64 [installed,local]
libcublas-dev-11-8/now 11.11.3.6-1 amd64 [installed,local]
libgslcblas0/now 2.7.1+dfsg-3 amd64 [installed,local]
liblapack-dev/now 3.10.0-2ubuntu1 amd64 [installed,local]
liblapack3/now 3.10.0-2ubuntu1 amd64 [installed,local]
libopenblas-dev/now 0.3.20+ds-1 amd64 [installed,local]
libopenblas-pthread-dev/now 0.3.20+ds-1 amd64 [installed,local]
libopenblas0-pthread/now 0.3.20+ds-1 amd64 [installed,local]
libopenblas0/now 0.3.20+ds-1 amd64 [installed,local]
> system('inxi')
12CPU 6-core Intel Xeon E5-2690 v3 (-MCP-) 12speed 2597 MHz 12Kernel 5.15.0-1039-azure x86_64 12Up 6h 28m
12Mem 12268.8/56218.3 MiB (21.8%) 12Storage 1.08 TiB (28.9% used) 12Procs 8
> system('python --version; pyenv --version; poetry --version')
Python 3.10.6
pyenv 2.3.18
Poetry (version 1.5.1)
> system('poetry show | grep -e ^numpy -e ^matplotlib -e ^pip -e ^setuptools')
matplotlib 3.7.1 Python plotting package
matplotlib-inline 0.1.6 Inline Matplotlib backend for J...
numpy 1.23.5 NumPy is the fundamental packag...
pip 23.1 The PyPA recommended tool for i...
setuptools 67.6.1 Easily download, build, install...
>
I didn't tested with nvblas/cublas libraries ...
Nice! Thanks @hute37 for testing and @eitsupi for the PR, great work!
Just wondering if we want to revisit support for multiple CUDA tags across a given/latest version of R. We bumped up to 11.8 with the R 4.2.2 / ubuntu 22.04 release, and I'm observing that it is not compatible with host platforms that might be running older CUDA drivers. (note that the host machine has to have driver versions greater than or equal to the libraries on the containers).
NVIDIA provides 11.7.0 and 11.7.1 on ubuntu-22, as well as 11.8.0 which we're using.
weirdly, I have one machine with NVIDIA Driver 470.141.03 CUDA Version: 11.4, which runs the 11.8 image fine, but a machine with slightly newer drivers : Driver Version: 515.65.01 CUDA Version: 11.7, can only run 11.7.1 but not the 11.8 dockerfiles.
welcome other experiences, I'll try and triangulate this one a bit more too.