Supportin multiple CUDA versions? (CUDA bumps to 11.8 on the 4.2.2 images)

Just wondering if we want to revisit support for multiple CUDA tags across a given/latest version of R. We bumped up to 11.8 with the R 4.2.2 / ubuntu 22.04 release, and I'm observing that it is not compatible with host platforms that might be running older CUDA drivers. (note that the host machine has to have driver versions greater than or equal to the libraries on the containers).

NVIDIA provides 11.7.0 and 11.7.1 on ubuntu-22, as well as 11.8.0 which we're using.

weirdly, I have one machine with NVIDIA Driver 470.141.03 CUDA Version: 11.4, which runs the 11.8 image fine, but a machine with slightly newer drivers : Driver Version: 515.65.01 CUDA Version: 11.7, can only run 11.7.1 but not the 11.8 dockerfiles.

welcome other experiences, I'll try and triangulate this one a bit more too.

I'm busy in CUDA (keras/tensorflow) setup for aged NVIDIA Tesla K80 Azure Datacenter GPU (2014)

NC-series

These GPU are rather old and near to be dismissed by Microsoft, but very cheap, useful for educational vm, used by students in our institution.

The NVIDIA kernel driver supported is branch 470:

in rocker/ml:4.2.3 container:

| NVIDIA-SMI 470.182.03   Driver Version: 470.182.03   CUDA Version: 11.8     |

from docs, 470 should be supported in all CUDA-11.x releases

3.3. Deployment Considerations for Forward Compatibility

But comparing repositories, the 470 (user-mode) driver is missing

ubuntu 22.04:

cuda-compat-11-7_515.43.04-1_amd64.deb
cuda-compat-11-7_515.48.07-1_amd64.deb
cuda-compat-11-7_515.65.01-1_amd64.deb
cuda-compat-11-7_515.65.07-1_amd64.deb
cuda-compat-11-7_515.86.01-1_amd64.deb
cuda-compat-11-7_515.105.01-1_amd64.deb
cuda-compat-11-8_520.61.05-1_amd64.deb
cuda-compat-12-0_525.60.13-1_amd64.deb
cuda-compat-12-0_525.85.12-1_amd64.deb
cuda-compat-12-0_525.105.17-1_amd64.deb
cuda-compat-12-1_530.30.02-1_amd64.deb

ubuntu 20.04:

cuda-compat-11-4_470.42.01-1_amd64.deb
cuda-compat-11-4_470.57.02-1_amd64.deb
cuda-compat-11-4_470.82.01-1_amd64.deb
cuda-compat-11-4_470.103.01-1_amd64.deb
cuda-compat-11-4_470.129.06-1_amd64.deb
cuda-compat-11-4_470.141.03-1_amd64.deb
cuda-compat-11-4_470.141.10-1_amd64.deb
cuda-compat-11-4_470.161.03-1_amd64.deb
cuda-compat-11-4_470.182.03-1_amd64.deb
cuda-compat-11-5_495.29.05-1_amd64.deb
cuda-compat-11-6_510.39.01-1_amd64.deb
cuda-compat-11-6_510.47.03-1_amd64.deb
cuda-compat-11-6_510.73.08-1_amd64.deb
cuda-compat-11-6_510.84-1_amd64.deb
cuda-compat-11-6_510.85.02-1_amd64.deb
cuda-compat-11-6_510.108.03-1_amd64.deb
cuda-compat-11-7_515.43.04-1_amd64.deb
cuda-compat-11-7_515.48.07-1_amd64.deb
cuda-compat-11-7_515.65.01-1_amd64.deb
cuda-compat-11-7_515.65.07-1_amd64.deb
cuda-compat-11-7_515.86.01-1_amd64.deb
cuda-compat-11-7_515.105.01-1_amd64.deb
cuda-compat-11-8_520.61.05-1_amd64.deb
cuda-compat-12-0_525.60.13-1_amd64.deb
cuda-compat-12-0_525.85.12-1_amd64.deb
cuda-compat-12-0_525.105.17-1_amd64.deb
cuda-compat-12-1_530.30.02-1_amd64.deb

I tried rocker/ml:4.2.3 container, based on cuda 11.8 (Ubuntu 22.04):

but it cannot work:

2023-04-14 18:04:34.360427: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: a5d8ae1ed794
2023-04-14 18:04:34.360439: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: a5d8ae1ed794
2023-04-14 18:04:34.360546: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 520.61.5
2023-04-14 18:04:34.360587: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 470.182.3
2023-04-14 18:04:34.360598: E tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:313] kernel version 470.182.3 does not match DSO version 520.61.5 -- cannot find working devices in this configuration

I also tried rocker/ml:4.2.1-cuda11.1 container, based on cuda 11.1 (Ubuntu 20.04):

but I found some problems:

libnvinfer not installable (no libnvinfer8-11-1)
libdevice not found
CuDNN version mismatch

Maybe a possible solution for cuda-11 support for 470 branch:

CUDA-11.4 support (CUDA-11.5 requires 495 drivers)
Ubuntu 20.04
cuda-compat-11-4_470
libnvinfer8_8.2.5-1+cuda11.4

@hute37 Thanks for digging into this, as you see, we're still figuring out the best strategy for handling CUDA versioning in this stack.

I haven't had a chance to investigate here, this is a great start but we will need to dig a bit deeper still. As you know, there are at least three moving parts in the versioning scheme we need to triangulate:

hardware version
host machine drivers
container CUDA versions

obviously rocker can only directly select versions in the third category. My issue up top referenced the second of these, but I think the right solution there is to recommend the user update the host drivers, rather than attempting to support all drivers. But on to your issue: it would definitely be nice to retain support for older hardware. I'm a bit puzzled why the cuda11.1 setup is not viable here, but it may be due to how rocker/ml:4.2.1-cuda11.1 is built than due to CUDA? As you've noticed, that version and prior versions of rocker CUDA stack added CUDA libs on top of the r-ver base image using custom scripts based on nvidia's containers, while in the current cuda 11.8 script we instead use official NVIDIA cuda Ubuntu-based images as our base image. (This was because at the time, NVIDIA only provided ubuntu-18.04 base images).

So rather than proliferate too many tags, it might be better if we can see about patching the cuda11.1 image correctly for this? For the libnvinfer issue, do you need libnvinfer8 ? IIRC, that release would have been aligned to libnvinfer7, no? Also not sure about the libdevice and CuDNN errors -- but not clear to me that the 11.1 vs 11.4 is really the source of the problem there?

Maybe the question is easy to formulate, but the answer is not ...

Q: "Which is the latest tensorflow version that can be used (by CRAN-keras) with obsolete (470-driver line) NVIDIA gpus ?"

Because of stack dependencies, the answer depends on several sub-questions:

470 kernel drivers (in host OS) require the same cuda-compat-470 user-driver in container
to support containers, also NVIDIA Container Toolkit is required
nvidia repository 470 driver availability seems to be limited to Ubuntu 20.04 only, with no support for 22.04
CUDA versions choice is very critical: only one of CUDA 11-x should be selected (10-x is rather old, 12-x is unsupported):
- CUDA 11-01 has no libnvinfer package (strange ...)
- CUDA 11-04 has the latest cuda-compat-11-4 user driver (required to match 470 kernel driver)
- CUDA 11-08 seems the "mimimal" version required by tensorflow 2.12
latest CRAN keras package installs tensorflow version 2.11 by default
CuDNN >= 8.6 seems to be a prerequisite for tensorflow v 2.11
nvBLAS/cuBLAS (>= 11.*) integration with OpenBLAS
optional TensorRT support (?)

While NVIDIA declares full support for 470 line until CUDA-12-2, in NVIDIA repositories some combinations are not supported, in particular for obsolete hardware.

I'd prefer apt based installations, but maybe another installation method could fill the gaps?

I would like to avoid conda/miniconda stacks because I need to interoperate with projects based on pyenv/poetry python environments.

chatGPT wasn't helpful ...

Thanks, this is definitely helpful.

NVIDIA obviously isn't making it easy for us by insisting that

host driver version must be >= toolkit version (i.e. drivers are backwards compatible with toolkit software)

while at the same time insisting that

drivers are not backwards compatible with old hardware.

The first choice makes sense to me, in that it allows users to still run older software and newer software by staying current on their drivers.

The second choice seems unfortunate, and is basically saying that if you want to use old hardware you'll be stuck on old software too. (Obviously that's financially in the interest of a company selling new hardware and may contribute to Microsoft's choice here too).

So I think this also supports your formulation of the question: the only way forward on old hardware is to lock in an old version of all the software as well, including an old version of keras, tensorflow, and cuda toolkit. Does that sound accurate?

Ok so now for nuts and bolts. given the above, I think it won't be viable to look for a solution that takes the default tensorflow version from current CRAN version of keras as the constraint -- it's not clear from the above that tensorflow 2.11 was intended for a driver 470 / cuda 11.1 / ubuntu 20.04 environment?

I don't have a machine running the 470 drivers available, so I can't help much to check things here, but can you see about some earlier versions of tensorflow? (in particular I'm not clear on the history of the libnvinfer libs here, they may have been introduced only later?)

It worked! But in a "manual (hammered)" non-containerized setup ...

One point is compatibility between (host) kernel driver from 470 line and user-mode cuda-driver (from cuda-compat-470, prepended in LD_LIBRARY_PATH and found before "standard" 5xx cuda-driver). For containers, NVIDIA container toolkit is also required.

With this setup, I could install CUDA-11-8 libraries. I successfully tested a simple array multiplication performed in GPU.

Another very important topic is cuDNN support.

cuDNN is the library where ML GPU magic happens. TF is linked very tight to this library. cuDNN is build around GPU Compute Capabilities, identified by a level code.

For instance, Tesla (Kepler) K80 has 3.7 capability level.

Your hardware supports a fixed level of compute capabilities, that fixes the maximum version number of cuDNN library, supporting that GPU, that, in turn, fixes the maximum version number of TensorFlow that can be used on that system

Some useful references:

A working configuration ...

Azure NC6 Virtual Machine

» inxi -b
System:
  Host: xt-si701-v01 Kernel: 5.15.0-1035-azure x86_64 bits: 64
    Desktop: Xfce 4.16.0 Distro: Ubuntu 22.04.2 LTS (Jammy Jellyfish)
Machine:
  Type: Desktop Mobo: Microsoft model: Virtual Machine v: 7.0
    serial: <superuser required> BIOS: American Megatrends v: 090007
    date: 06/02/2017
CPU:
  Info: 6-core Intel Xeon E5-2690 v3 [MCP] speed (MHz): avg: 2597
Graphics:
  Device-1: Microsoft Hyper-V virtual VGA driver: hyperv_drm v: kernel
  Device-2: NVIDIA GK210GL [Tesla K80] driver: nvidia v: 470.182.03
  Display: x11 server: X.Org v: 1.20.9 driver: X: loaded: N/A
    unloaded: modesetting gpu: hyperv_drm note:  X driver n/a
    resolution: 1920x1080~60Hz
  OpenGL: renderer: llvmpipe (LLVM 15.0.6 256 bits) v: 4.5 Mesa 22.2.5

» lspci | grep NVIDIA
0001:00:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)

» nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.182.03   Driver Version: 470.182.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000001:00:00.0 Off |                    0 |
| N/A   38C    P0    81W / 149W |      0MiB / 11441MiB |     55%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

Drivers and Container Toolkit: Apt installed

» apt list --installed | grep -i -e nvidia -e cuda

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

cuda-keyring/unknown,now 1.0-1 all [installed]
libnvidia-cfg1-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
libnvidia-common-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 all [installed,automatic]
libnvidia-compute-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
libnvidia-container-tools/unknown,now 1.13.0-1 amd64 [installed,automatic]
libnvidia-container1/unknown,now 1.13.0-1 amd64 [installed,automatic]
libnvidia-decode-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
libnvidia-egl-wayland1/jammy,now 1:1.1.9-1.1 amd64 [installed,automatic]
libnvidia-encode-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
libnvidia-extra-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
libnvidia-fbc1-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
libnvidia-gl-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
libnvidia-ifr1-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
mate-sensors-applet-nvidia/jammy,now 1.26.0-1 amd64 [installed]
nvidia-compute-utils-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
nvidia-container-toolkit-base/unknown,now 1.13.0-1 amd64 [installed]
nvidia-container-toolkit/unknown,now 1.13.0-1 amd64 [installed]
nvidia-dkms-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
nvidia-driver-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed]
nvidia-kernel-common-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
nvidia-kernel-source-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
nvidia-prime/jammy,now 0.8.17.1 all [installed,automatic]
nvidia-settings/unknown,now 530.30.02-0ubuntu1 amd64 [installed,automatic]
nvidia-utils-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed]
xserver-xorg-video-nvidia-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]

Manual Installation in /usr/local/cuda

I manually downloaded installation packages:

cuda_11.8.0_520.61.05_linux.run
cuda-compat-11-4_470.182.03-1_amd64.deb
cudnn-linux-x86_64-8.6.0.163_cuda11-archive.tar.xz
libnvinfer8_8.5.3-1+cuda11.8_amd64.deb
libnvinfer-dev_8.5.3-1+cuda11.8_amd64.deb
libnvinfer-plugin8_8.5.3-1+cuda11.8_amd64.deb
libnvinfer-plugin-dev_8.5.3-1+cuda11.8_amd64.deb

What	Version	Mode	Source
Kernel Driver	470.182.03	apt	nvidia-driver-470
CUDA Driver	470.*	local	cuda-compat-11-4_470.182.03-*
CUDA Libraries	11-8	local	cuda_11.8.0_520*_linux.run
libnvinfer	8.5.3-*	local	linnvinfer8_8.5.3 (+plugin8, + dev)
cuDNN	8.6.0.*	local	cudnn-linux-x86_64-8.6.0.163_cuda11-archive.tar.xz
TensorFlow	2.12.0	PyPI	poetry

Also required under /etc/ld.so.conf.d

$ cd /etc/ld.so.conf.d
$ /etc/ld.so.conf.d# ls -1 1*cuda*
114_cuda-11-compat.conf
118_cuda-11-local.conf
$ cat 1*cuda*
/usr/local/cuda/compat
#/usr/local/cuda-11.8/lib64
/usr/local/cuda/lib64
#/usr/local/cuda-11.8/lib64

# update cache

sudo ldconfig

Check Libraries:

$ ldconfig -p | grep -i -e 'libcuda.so' -e 'lib..blas.*.so' -e 'libcudnn.so' -e 'libnvinfer.*.so' -e 'libcudnn.*.so'

    libnvinfer_plugin.so.8 (libc6,x86-64) => /usr/local/cuda/lib64/libnvinfer_plugin.so.8
    libnvinfer_plugin.so (libc6,x86-64) => /usr/local/cuda/lib64/libnvinfer_plugin.so
    libnvinfer.so.8 (libc6,x86-64) => /usr/local/cuda/lib64/libnvinfer.so.8
    libnvinfer.so (libc6,x86-64) => /usr/local/cuda/lib64/libnvinfer.so
    libnvblas.so.11 (libc6,x86-64) => /usr/local/cuda/lib64/libnvblas.so.11
    libnvblas.so (libc6,x86-64) => /usr/local/cuda/lib64/libnvblas.so
    libcudnn_ops_train.so.8 (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_ops_train.so.8
    libcudnn_ops_train.so (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_ops_train.so
    libcudnn_ops_infer.so.8 (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_ops_infer.so.8
    libcudnn_ops_infer.so (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_ops_infer.so
    libcudnn_cnn_train.so.8 (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_cnn_train.so.8
    libcudnn_cnn_train.so (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_cnn_train.so
    libcudnn_cnn_infer.so.8 (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_cnn_infer.so.8
    libcudnn_cnn_infer.so (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_cnn_infer.so
    libcudnn_adv_train.so.8 (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_adv_train.so.8
    libcudnn_adv_train.so (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_adv_train.so
    libcudnn_adv_infer.so.8 (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_adv_infer.so.8
    libcudnn_adv_infer.so (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_adv_infer.so
    libcudnn.so.8 (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn.so.8
    libcudnn.so (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn.so
    libcuda.so.1 (libc6,x86-64) => /usr/local/cuda/compat/libcuda.so.1
    libcuda.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcuda.so.1
    libcuda.so (libc6,x86-64) => /usr/local/cuda/compat/libcuda.so
    libcuda.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libcuda.so
    libcublasLt.so.11 (libc6,x86-64) => /usr/local/cuda/lib64/libcublasLt.so.11
    libcublasLt.so (libc6,x86-64) => /usr/local/cuda/lib64/libcublasLt.so
    libcublas.so.11 (libc6,x86-64) => /usr/local/cuda/lib64/libcublas.so.11
    libcublas.so (libc6,x86-64) => /usr/local/cuda/lib64/libcublas.so

A simple (python) test script:

dummy-gpu-tf.py

» python exec/dummy-gpu-tf.py                                                                         
2023-04-19 18:32:53.603349: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.                                                                                 
Num GPUs Available:  1                             
GPUs:  [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]                                                                                                                                    
2023-04-19 18:33:01.566433: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10766 MB memory:  -> device: 0, name: Tesla K80, pci bu
s id: 0001:00:00.0, compute capability: 3.7                                                                                                                                                                  

...

Epoch 1/15                                         
2023-04-19 18:33:03.653302: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:424] Loaded cuDNN version 8600                                                                                        
2023-04-19 18:33:04.129065: I tensorflow/compiler/xla/service/service.cc:169] XLA service 0x7f52e400cf70 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-04-19 18:33:04.129092: I tensorflow/compiler/xla/service/service.cc:177]   StreamExecutor device (0): Tesla K80, Compute Capability 3.7                                                                 
2023-04-19 18:33:04.135966: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2023-04-19 18:33:04.308010: I ./tensorflow/compiler/jit/device_compiler.h:180] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
422/422 [==============================] - 6s 8ms/step - loss: 0.3709 - accuracy: 0.8885 - val_loss: 0.0797 - val_accuracy: 0.9798                                                                           

...

Building a container is a different matter ...

Maybe a multistage container build could be used to grab cuda parts from different base images (?)

It worked!

Sources

Image Patch

install_ubs-cuda-11-470.sh

Test Scripts

Test based on Tensorflow guide: "Use a GPU"

Configuration

Rocker-Project

Base Image	Version
ml	4.2.3
ml-verse	4.2.3

Hardware

Virtual Machine	GPU Card	GPU Family	NVIDIA driver version	compute capabilities
Azure NC6	K80	Tesla/Kepler	470.*	3.7
Azure NV12	M60	Tesla/Maxwell	470.*	5.2

Host OS

Component	Name	version	note
Operating System	Ubuntu	18.04, 22.04
Container Environ	Podman	3.4.2	rootless mode
Container Runtime	crun
Container Storage	fuse-overlay		under ubuntu 18.04, rootless mode requires `fuse-overlay`
NVIDIA Kernel Driver	nvidia-driver-470	470.182.03	dkms apt-get install from NVIDIA repo
NVIDIA Compute Driver	libnvidia-compute-470	470.182.03	cuda-11 apt-get install from NVIDIA repo
NVIDIA Container Toolkit	nvidia-container-toolkit	1.13.1-1	cuda-11 apt-get install from NVIDIA repo

Image CUDA Stack

Component	Name	version	note
Operating System	Ubuntu	22.04
Framework	CUDA-11	11.08
"Compat" Libraries	-	-	removed from base image, in confict with host container toolkit
CUDA Compiler	NVCC	-	installed required for bytecode generation
ML DNN support	cuDNN	8.6	downgraded for GPU (3.7 level) compatibility support
BLAS	nvblas/cublab		unsupported, this librarry cannot be enabled in build phase during `install.r` package setup

Image Python Stack

Component	Name	version	note
Environment manager	pyenv
Language Interpreter	python	3.10.6	pyenv install
Package Manager	poetry		`pyproject.toml` definition
ML Framework	Tensorflow	2.11
ML Modelling	Keras

@hute37 hey well done, that's pretty cool! so it looks like dropping compat libraries and rolling cuDNN back to 8.6 was key? Nicely written install script, thanks for sharing!

@cboettig

cuDNN compute capabilities requirements match with your GPU is critical. NVIDIA declares support for 3.5+ level (and 470 driver) for all CUDA-11 releases, but later components seem to break support.

Also NVCC compiler is a requirement: I have read something about suppression of byte-code cache for some GPU, so compiler must be present to regenerate cache on the fly (with a noticeable startup delay?)

libcuda-compat is a strange thing ... Early cuda support in containers directly exported the GPU device, that was handled inside the container. Recent model (NVIDIA container toolkit) keeps the image agnostic in terms of GPU model and availability (in a CI/CD pipeline, many systems share the same image with different GPU). It is critical that the host running the container "injects" the right driver (mount) at runtime. So far, so good ... The strange fact was that, if compat was present in container, internal driver (520) took precedence over injected one (470)

One thing is still missing: BLAS/LAPACK support, ...

I tried to patch R/Rscript renviron to enable LD_PRELOAD trick to link nvblas/cublas library in front of standard OpenBlas (what about MKL?). The problem here is that GPU is not available during image build phase and all install.r calls generated a lot of warnings. Runtime inclusion of nvblas would be better but is rather complex ...

In terms of support lines, maybe a cuda-11 (470/520 driver compatible) base image line could be a nice addition, while the "latest and greatest" images could support cuda-12+ and driver 520 only

Runtime inclusion of nvblas would be better but is rather complex ...

To provide NVBLAS-enabled R and Rscript:

cp -a $(which R) $(which R)_
echo '#!/bin/bash' > $(which R)
echo "command -v nvidia-smi >/dev/null && nvidia-smi -L | grep 'GPU[[:space:]]\?[[:digit:]]\+' >/dev/null && export LD_PRELOAD=libnvblas.so" >> $(which R)
echo "$(which R_) \"\${@}\"" >> $(which R)

cp -a $(which Rscript) $(which Rscript)_
echo '#!/bin/bash' > $(which Rscript)
echo "command -v nvidia-smi >/dev/null && nvidia-smi -L | grep 'GPU[[:space:]]\?[[:digit:]]\+' >/dev/null && export LD_PRELOAD=libnvblas.so" >> $(which Rscript)
echo "$(which Rscript_) \"\${@}\"" >> $(which Rscript)

👉 Enabled at runtime and only if nvidia-smi and at least one GPU are present.

(LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64)

Run some benchmarks to ensure that NVBLAS actually outperforms OpenBLAS.
ℹ️ In some of my setups it does not. That is why I decided to provide NVBLAS-enabled R_ and Rscript_ in addition to the default R and Rscript.

References:

Accelerated R with CUDA on Linux – Clint's Blog
https://github.com/b-data/r-docker-stack/blob/main/cuda/latest.Dockerfile#L38-L50. ℹ️ glcr.b-data.ch/cuda/r/ver:R_VERSION-devel serves as parent image for glcr.b-data.ch/jupyterlab/cuda/r/base:R_VERSION.

I found an issue in BLAS configuration.

In base nvidia-cuda image, OpenBLAS libraries, while installed, are disabled in /etc/alternatives configuration.

In the image, with nvblas/cublas enabled, system BLAS config is reset to basic (slow) libraries, but /etc/nvblas.conf wraps (fast) OpenBLAS version:

NVBLAS_CPU_BLAS_LIB /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3

To reset BLAS configuration I had to include in my setup:

update-alternatives --auto    libblas.so.3-x86_64-linux-gnu     
update-alternatives --auto    liblapack.so.3-x86_64-linux-gnu

To check, (from container bash):

Rscript -e 'sessionInfo()' | grep -e 'BLAS' -e 'LAPACK'

References:

Improving R Perfomance by installing optimized BLAS/LAPACK libraries

@hute37 iirc, BLAS was intentionally turned off my default due to https://github.com/rocker-org/rocker-versioned2/issues/471 , which I believe was traced to an open issue with how either numpy or the openblas libraries handled suffixes on its symbols, see https://github.com/numpy/numpy/issues/21643 .

Although the numpy issue thread is still open, I believe that issue impacted only older libraries on Ubuntu 20.04 openblas, and that the newer openblas on 22.04 was not impacted. @hute37 would you be able to quickly test that, e.g. the reprex in https://github.com/rstudio/reticulate/issues/1190 no longer segfaults when you enable openblas?

@eitsupi Do you think we could turn openblas config back on by default for 22.04 cuda images while leaving it off for the 20.04 images?

@eitsupi Do you think we could turn openblas config back on by default for 22.04 cuda images while leaving it off for the 20.04 images?

Sure. I think we just need to add Ubuntu 20.04 to the conditions in the following section.

https://github.com/rocker-org/rocker-versioned2/blob/8279ff1f01eb1c9d58ee1a72f7821033253a4838/scripts/install_python.sh#L43-L50

This test runs without any errors under this configuration:

"Segfault using scipy filter"

> sessionInfo()

R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Rome
tzcode source: system (glibc)

...

> system('update-alternatives --display libblas.so.3-x86_64-linux-gnu; update-alternatives --display liblapack.so.3-x86_64-linux-gnu')
libblas.so.3-x86_64-linux-gnu - auto mode
  link best version is /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
  link currently points to /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
  link libblas.so.3-x86_64-linux-gnu is /usr/lib/x86_64-linux-gnu/libblas.so.3
/usr/lib/x86_64-linux-gnu/blas/libblas.so.3 - priority 10
/usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 - priority 100
liblapack.so.3-x86_64-linux-gnu - auto mode
  link best version is /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
  link currently points to /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
  link liblapack.so.3-x86_64-linux-gnu is /usr/lib/x86_64-linux-gnu/liblapack.so.3
/usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3 - priority 10
/usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3 - priority 100

> system('apt list --installed | grep -i -e blas -e lapack ')

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

libblas-dev/now 3.10.0-2ubuntu1 amd64 [installed,local]
libblas3/now 3.10.0-2ubuntu1 amd64 [installed,local]
libcublas-11-8/now 11.11.3.6-1 amd64 [installed,local]
libcublas-dev-11-8/now 11.11.3.6-1 amd64 [installed,local]
libgslcblas0/now 2.7.1+dfsg-3 amd64 [installed,local]
liblapack-dev/now 3.10.0-2ubuntu1 amd64 [installed,local]
liblapack3/now 3.10.0-2ubuntu1 amd64 [installed,local]
libopenblas-dev/now 0.3.20+ds-1 amd64 [installed,local]
libopenblas-pthread-dev/now 0.3.20+ds-1 amd64 [installed,local]
libopenblas0-pthread/now 0.3.20+ds-1 amd64 [installed,local]
libopenblas0/now 0.3.20+ds-1 amd64 [installed,local]

> system('inxi')
12CPU 6-core Intel Xeon E5-2690 v3 (-MCP-)  12speed  2597 MHz  12Kernel  5.15.0-1039-azure x86_64  12Up  6h 28m 
12Mem 12268.8/56218.3 MiB (21.8%) 12Storage 1.08 TiB (28.9% used) 12Procs 8

> system('python --version; pyenv --version; poetry --version')
Python 3.10.6
pyenv 2.3.18
Poetry (version 1.5.1)

> system('poetry show | grep -e ^numpy -e ^matplotlib -e ^pip -e ^setuptools')
matplotlib                    3.7.1         Python plotting package
matplotlib-inline             0.1.6         Inline Matplotlib backend for J...
numpy                         1.23.5        NumPy is the fundamental packag...
pip                           23.1          The PyPA recommended tool for i...
setuptools                    67.6.1        Easily download, build, install...

>

I didn't tested with nvblas/cublas libraries ...

Nice! Thanks @hute37 for testing and @eitsupi for the PR, great work!

rocker-org / rocker-versioned2