rocker-org / rocker-versioned2

Run current & prior versions of R using docker. rocker/r-ver, rocker/rstudio, rocker/shiny, rocker/tidyverse, and so on.
https://rocker-project.org
GNU General Public License v2.0
405 stars 166 forks source link

Supportin multiple CUDA versions? (CUDA bumps to 11.8 on the 4.2.2 images) #582

Open cboettig opened 1 year ago

cboettig commented 1 year ago

Just wondering if we want to revisit support for multiple CUDA tags across a given/latest version of R. We bumped up to 11.8 with the R 4.2.2 / ubuntu 22.04 release, and I'm observing that it is not compatible with host platforms that might be running older CUDA drivers. (note that the host machine has to have driver versions greater than or equal to the libraries on the containers).

NVIDIA provides 11.7.0 and 11.7.1 on ubuntu-22, as well as 11.8.0 which we're using.

weirdly, I have one machine with NVIDIA Driver 470.141.03 CUDA Version: 11.4, which runs the 11.8 image fine, but a machine with slightly newer drivers : Driver Version: 515.65.01 CUDA Version: 11.7, can only run 11.7.1 but not the 11.8 dockerfiles.

welcome other experiences, I'll try and triangulate this one a bit more too.

hute37 commented 1 year ago

I'm busy in CUDA (keras/tensorflow) setup for aged NVIDIA Tesla K80 Azure Datacenter GPU (2014)

These GPU are rather old and near to be dismissed by Microsoft, but very cheap, useful for educational vm, used by students in our institution.

The NVIDIA kernel driver supported is branch 470:

in rocker/ml:4.2.3 container:

| NVIDIA-SMI 470.182.03   Driver Version: 470.182.03   CUDA Version: 11.8     |

from docs, 470 should be supported in all CUDA-11.x releases

But comparing repositories, the 470 (user-mode) driver is missing

cuda-compat-11-7_515.43.04-1_amd64.deb
cuda-compat-11-7_515.48.07-1_amd64.deb
cuda-compat-11-7_515.65.01-1_amd64.deb
cuda-compat-11-7_515.65.07-1_amd64.deb
cuda-compat-11-7_515.86.01-1_amd64.deb
cuda-compat-11-7_515.105.01-1_amd64.deb
cuda-compat-11-8_520.61.05-1_amd64.deb
cuda-compat-12-0_525.60.13-1_amd64.deb
cuda-compat-12-0_525.85.12-1_amd64.deb
cuda-compat-12-0_525.105.17-1_amd64.deb
cuda-compat-12-1_530.30.02-1_amd64.deb
cuda-compat-11-4_470.42.01-1_amd64.deb
cuda-compat-11-4_470.57.02-1_amd64.deb
cuda-compat-11-4_470.82.01-1_amd64.deb
cuda-compat-11-4_470.103.01-1_amd64.deb
cuda-compat-11-4_470.129.06-1_amd64.deb
cuda-compat-11-4_470.141.03-1_amd64.deb
cuda-compat-11-4_470.141.10-1_amd64.deb
cuda-compat-11-4_470.161.03-1_amd64.deb
cuda-compat-11-4_470.182.03-1_amd64.deb
cuda-compat-11-5_495.29.05-1_amd64.deb
cuda-compat-11-6_510.39.01-1_amd64.deb
cuda-compat-11-6_510.47.03-1_amd64.deb
cuda-compat-11-6_510.73.08-1_amd64.deb
cuda-compat-11-6_510.84-1_amd64.deb
cuda-compat-11-6_510.85.02-1_amd64.deb
cuda-compat-11-6_510.108.03-1_amd64.deb
cuda-compat-11-7_515.43.04-1_amd64.deb
cuda-compat-11-7_515.48.07-1_amd64.deb
cuda-compat-11-7_515.65.01-1_amd64.deb
cuda-compat-11-7_515.65.07-1_amd64.deb
cuda-compat-11-7_515.86.01-1_amd64.deb
cuda-compat-11-7_515.105.01-1_amd64.deb
cuda-compat-11-8_520.61.05-1_amd64.deb
cuda-compat-12-0_525.60.13-1_amd64.deb
cuda-compat-12-0_525.85.12-1_amd64.deb
cuda-compat-12-0_525.105.17-1_amd64.deb
cuda-compat-12-1_530.30.02-1_amd64.deb

I tried rocker/ml:4.2.3 container, based on cuda 11.8 (Ubuntu 22.04):

but it cannot work:

2023-04-14 18:04:34.360427: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: a5d8ae1ed794
2023-04-14 18:04:34.360439: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: a5d8ae1ed794
2023-04-14 18:04:34.360546: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 520.61.5
2023-04-14 18:04:34.360587: I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 470.182.3
2023-04-14 18:04:34.360598: E tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:313] kernel version 470.182.3 does not match DSO version 520.61.5 -- cannot find working devices in this configuration

I also tried rocker/ml:4.2.1-cuda11.1 container, based on cuda 11.1 (Ubuntu 20.04):

but I found some problems:


Maybe a possible solution for cuda-11 support for 470 branch:

cboettig commented 1 year ago

@hute37 Thanks for digging into this, as you see, we're still figuring out the best strategy for handling CUDA versioning in this stack.

I haven't had a chance to investigate here, this is a great start but we will need to dig a bit deeper still. As you know, there are at least three moving parts in the versioning scheme we need to triangulate:

obviously rocker can only directly select versions in the third category. My issue up top referenced the second of these, but I think the right solution there is to recommend the user update the host drivers, rather than attempting to support all drivers. But on to your issue: it would definitely be nice to retain support for older hardware. I'm a bit puzzled why the cuda11.1 setup is not viable here, but it may be due to how rocker/ml:4.2.1-cuda11.1 is built than due to CUDA? As you've noticed, that version and prior versions of rocker CUDA stack added CUDA libs on top of the r-ver base image using custom scripts based on nvidia's containers, while in the current cuda 11.8 script we instead use official NVIDIA cuda Ubuntu-based images as our base image. (This was because at the time, NVIDIA only provided ubuntu-18.04 base images).

So rather than proliferate too many tags, it might be better if we can see about patching the cuda11.1 image correctly for this? For the libnvinfer issue, do you need libnvinfer8 ? IIRC, that release would have been aligned to libnvinfer7, no? Also not sure about the libdevice and CuDNN errors -- but not clear to me that the 11.1 vs 11.4 is really the source of the problem there?

hute37 commented 1 year ago

Maybe the question is easy to formulate, but the answer is not ...

Q: "Which is the latest tensorflow version that can be used (by CRAN-keras) with obsolete (470-driver line) NVIDIA gpus ?"

Because of stack dependencies, the answer depends on several sub-questions:


While NVIDIA declares full support for 470 line until CUDA-12-2, in NVIDIA repositories some combinations are not supported, in particular for obsolete hardware.

I'd prefer apt based installations, but maybe another installation method could fill the gaps?

I would like to avoid conda/miniconda stacks because I need to interoperate with projects based on pyenv/poetry python environments.


chatGPT wasn't helpful ...

cboettig commented 1 year ago

Thanks, this is definitely helpful.

NVIDIA obviously isn't making it easy for us by insisting that

while at the same time insisting that

The first choice makes sense to me, in that it allows users to still run older software and newer software by staying current on their drivers.

The second choice seems unfortunate, and is basically saying that if you want to use old hardware you'll be stuck on old software too. (Obviously that's financially in the interest of a company selling new hardware and may contribute to Microsoft's choice here too).

So I think this also supports your formulation of the question: the only way forward on old hardware is to lock in an old version of all the software as well, including an old version of keras, tensorflow, and cuda toolkit. Does that sound accurate?

Ok so now for nuts and bolts. given the above, I think it won't be viable to look for a solution that takes the default tensorflow version from current CRAN version of keras as the constraint -- it's not clear from the above that tensorflow 2.11 was intended for a driver 470 / cuda 11.1 / ubuntu 20.04 environment?

I don't have a machine running the 470 drivers available, so I can't help much to check things here, but can you see about some earlier versions of tensorflow? (in particular I'm not clear on the history of the libnvinfer libs here, they may have been introduced only later?)

hute37 commented 1 year ago

It worked! But in a "manual (hammered)" non-containerized setup ...


One point is compatibility between (host) kernel driver from 470 line and user-mode cuda-driver (from cuda-compat-470, prepended in LD_LIBRARY_PATH and found before "standard" 5xx cuda-driver). For containers, NVIDIA container toolkit is also required.

With this setup, I could install CUDA-11-8 libraries. I successfully tested a simple array multiplication performed in GPU.


Another very important topic is cuDNN support.

cuDNN is the library where ML GPU magic happens. TF is linked very tight to this library. cuDNN is build around GPU Compute Capabilities, identified by a level code.

For instance, Tesla (Kepler) K80 has 3.7 capability level.

Your hardware supports a fixed level of compute capabilities, that fixes the maximum version number of cuDNN library, supporting that GPU, that, in turn, fixes the maximum version number of TensorFlow that can be used on that system

Some useful references:


A working configuration ...

» inxi -b
System:
  Host: xt-si701-v01 Kernel: 5.15.0-1035-azure x86_64 bits: 64
    Desktop: Xfce 4.16.0 Distro: Ubuntu 22.04.2 LTS (Jammy Jellyfish)
Machine:
  Type: Desktop Mobo: Microsoft model: Virtual Machine v: 7.0
    serial: <superuser required> BIOS: American Megatrends v: 090007
    date: 06/02/2017
CPU:
  Info: 6-core Intel Xeon E5-2690 v3 [MCP] speed (MHz): avg: 2597
Graphics:
  Device-1: Microsoft Hyper-V virtual VGA driver: hyperv_drm v: kernel
  Device-2: NVIDIA GK210GL [Tesla K80] driver: nvidia v: 470.182.03
  Display: x11 server: X.Org v: 1.20.9 driver: X: loaded: N/A
    unloaded: modesetting gpu: hyperv_drm note:  X driver n/a
    resolution: 1920x1080~60Hz
  OpenGL: renderer: llvmpipe (LLVM 15.0.6 256 bits) v: 4.5 Mesa 22.2.5

» lspci | grep NVIDIA
0001:00:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)

» nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.182.03   Driver Version: 470.182.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000001:00:00.0 Off |                    0 |
| N/A   38C    P0    81W / 149W |      0MiB / 11441MiB |     55%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

Drivers and Container Toolkit: Apt installed

» apt list --installed | grep -i -e nvidia -e cuda

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

cuda-keyring/unknown,now 1.0-1 all [installed]
libnvidia-cfg1-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
libnvidia-common-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 all [installed,automatic]
libnvidia-compute-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
libnvidia-container-tools/unknown,now 1.13.0-1 amd64 [installed,automatic]
libnvidia-container1/unknown,now 1.13.0-1 amd64 [installed,automatic]
libnvidia-decode-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
libnvidia-egl-wayland1/jammy,now 1:1.1.9-1.1 amd64 [installed,automatic]
libnvidia-encode-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
libnvidia-extra-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
libnvidia-fbc1-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
libnvidia-gl-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
libnvidia-ifr1-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
mate-sensors-applet-nvidia/jammy,now 1.26.0-1 amd64 [installed]
nvidia-compute-utils-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
nvidia-container-toolkit-base/unknown,now 1.13.0-1 amd64 [installed]
nvidia-container-toolkit/unknown,now 1.13.0-1 amd64 [installed]
nvidia-dkms-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
nvidia-driver-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed]
nvidia-kernel-common-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
nvidia-kernel-source-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]
nvidia-prime/jammy,now 0.8.17.1 all [installed,automatic]
nvidia-settings/unknown,now 530.30.02-0ubuntu1 amd64 [installed,automatic]
nvidia-utils-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed]
xserver-xorg-video-nvidia-470/jammy-updates,jammy-security,now 470.182.03-0ubuntu0.22.04.1 amd64 [installed,automatic]

Manual Installation in /usr/local/cuda

I manually downloaded installation packages:

cuda_11.8.0_520.61.05_linux.run
cuda-compat-11-4_470.182.03-1_amd64.deb
cudnn-linux-x86_64-8.6.0.163_cuda11-archive.tar.xz
libnvinfer8_8.5.3-1+cuda11.8_amd64.deb
libnvinfer-dev_8.5.3-1+cuda11.8_amd64.deb
libnvinfer-plugin8_8.5.3-1+cuda11.8_amd64.deb
libnvinfer-plugin-dev_8.5.3-1+cuda11.8_amd64.deb
What Version Mode Source
Kernel Driver 470.182.03 apt nvidia-driver-470
CUDA Driver 470.* local cuda-compat-11-4_470.182.03-*
CUDA Libraries 11-8 local cuda_11.8.0_520*_linux.run
libnvinfer 8.5.3-* local linnvinfer8_8.5.3 (+plugin8, + dev)
cuDNN 8.6.0.* local cudnn-linux-x86_64-8.6.0.163_cuda11-archive.tar.xz
TensorFlow 2.12.0 PyPI poetry

Also required under /etc/ld.so.conf.d

$ cd /etc/ld.so.conf.d
$ /etc/ld.so.conf.d# ls -1 1*cuda*
114_cuda-11-compat.conf
118_cuda-11-local.conf
$ cat 1*cuda*
/usr/local/cuda/compat
#/usr/local/cuda-11.8/lib64
/usr/local/cuda/lib64
#/usr/local/cuda-11.8/lib64

# update cache

sudo ldconfig

Check Libraries:

$ ldconfig -p | grep -i -e 'libcuda.so' -e 'lib..blas.*.so' -e 'libcudnn.so' -e 'libnvinfer.*.so' -e 'libcudnn.*.so'

    libnvinfer_plugin.so.8 (libc6,x86-64) => /usr/local/cuda/lib64/libnvinfer_plugin.so.8
    libnvinfer_plugin.so (libc6,x86-64) => /usr/local/cuda/lib64/libnvinfer_plugin.so
    libnvinfer.so.8 (libc6,x86-64) => /usr/local/cuda/lib64/libnvinfer.so.8
    libnvinfer.so (libc6,x86-64) => /usr/local/cuda/lib64/libnvinfer.so
    libnvblas.so.11 (libc6,x86-64) => /usr/local/cuda/lib64/libnvblas.so.11
    libnvblas.so (libc6,x86-64) => /usr/local/cuda/lib64/libnvblas.so
    libcudnn_ops_train.so.8 (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_ops_train.so.8
    libcudnn_ops_train.so (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_ops_train.so
    libcudnn_ops_infer.so.8 (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_ops_infer.so.8
    libcudnn_ops_infer.so (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_ops_infer.so
    libcudnn_cnn_train.so.8 (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_cnn_train.so.8
    libcudnn_cnn_train.so (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_cnn_train.so
    libcudnn_cnn_infer.so.8 (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_cnn_infer.so.8
    libcudnn_cnn_infer.so (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_cnn_infer.so
    libcudnn_adv_train.so.8 (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_adv_train.so.8
    libcudnn_adv_train.so (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_adv_train.so
    libcudnn_adv_infer.so.8 (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_adv_infer.so.8
    libcudnn_adv_infer.so (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn_adv_infer.so
    libcudnn.so.8 (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn.so.8
    libcudnn.so (libc6,x86-64) => /usr/local/cuda/lib64/libcudnn.so
    libcuda.so.1 (libc6,x86-64) => /usr/local/cuda/compat/libcuda.so.1
    libcuda.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcuda.so.1
    libcuda.so (libc6,x86-64) => /usr/local/cuda/compat/libcuda.so
    libcuda.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libcuda.so
    libcublasLt.so.11 (libc6,x86-64) => /usr/local/cuda/lib64/libcublasLt.so.11
    libcublasLt.so (libc6,x86-64) => /usr/local/cuda/lib64/libcublasLt.so
    libcublas.so.11 (libc6,x86-64) => /usr/local/cuda/lib64/libcublas.so.11
    libcublas.so (libc6,x86-64) => /usr/local/cuda/lib64/libcublas.so

A simple (python) test script:

» python exec/dummy-gpu-tf.py                                                                         
2023-04-19 18:32:53.603349: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.                                                                                 
Num GPUs Available:  1                             
GPUs:  [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]                                                                                                                                    
2023-04-19 18:33:01.566433: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10766 MB memory:  -> device: 0, name: Tesla K80, pci bu
s id: 0001:00:00.0, compute capability: 3.7                                                                                                                                                                  

...

Epoch 1/15                                         
2023-04-19 18:33:03.653302: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:424] Loaded cuDNN version 8600                                                                                        
2023-04-19 18:33:04.129065: I tensorflow/compiler/xla/service/service.cc:169] XLA service 0x7f52e400cf70 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-04-19 18:33:04.129092: I tensorflow/compiler/xla/service/service.cc:177]   StreamExecutor device (0): Tesla K80, Compute Capability 3.7                                                                 
2023-04-19 18:33:04.135966: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2023-04-19 18:33:04.308010: I ./tensorflow/compiler/jit/device_compiler.h:180] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
422/422 [==============================] - 6s 8ms/step - loss: 0.3709 - accuracy: 0.8885 - val_loss: 0.0797 - val_accuracy: 0.9798                                                                           

...

Building a container is a different matter ...

Maybe a multistage container build could be used to grab cuda parts from different base images (?)

hute37 commented 1 year ago

It worked!

Sources

Image Patch

Test Scripts

Test based on Tensorflow guide: "Use a GPU"

Configuration

Rocker-Project

Base Image Version
ml 4.2.3
ml-verse 4.2.3

Hardware

Virtual Machine GPU Card GPU Family NVIDIA driver version compute capabilities
Azure NC6 K80 Tesla/Kepler 470.* 3.7
Azure NV12 M60 Tesla/Maxwell 470.* 5.2

Host OS

Component Name version note
Operating System Ubuntu 18.04, 22.04
Container Environ Podman 3.4.2 rootless mode
Container Runtime crun
Container Storage fuse-overlay under ubuntu 18.04, rootless mode requires fuse-overlay
NVIDIA Kernel Driver nvidia-driver-470 470.182.03 dkms apt-get install from NVIDIA repo
NVIDIA Compute Driver libnvidia-compute-470 470.182.03 cuda-11 apt-get install from NVIDIA repo
NVIDIA Container Toolkit nvidia-container-toolkit 1.13.1-1 cuda-11 apt-get install from NVIDIA repo

Image CUDA Stack

Component Name version note
Operating System Ubuntu 22.04
Framework CUDA-11 11.08
"Compat" Libraries - - removed from base image, in confict with host container toolkit
CUDA Compiler NVCC - installed required for bytecode generation
ML DNN support cuDNN 8.6 downgraded for GPU (3.7 level) compatibility support
BLAS nvblas/cublab unsupported, this librarry cannot be enabled in build phase during install.r package setup

Image Python Stack

Component Name version note
Environment manager pyenv
Language Interpreter python 3.10.6 pyenv install
Package Manager poetry pyproject.toml definition
ML Framework Tensorflow 2.11
ML Modelling Keras
cboettig commented 1 year ago

@hute37 hey well done, that's pretty cool! so it looks like dropping compat libraries and rolling cuDNN back to 8.6 was key? Nicely written install script, thanks for sharing!

hute37 commented 1 year ago

@cboettig

cuDNN compute capabilities requirements match with your GPU is critical. NVIDIA declares support for 3.5+ level (and 470 driver) for all CUDA-11 releases, but later components seem to break support.

Also NVCC compiler is a requirement: I have read something about suppression of byte-code cache for some GPU, so compiler must be present to regenerate cache on the fly (with a noticeable startup delay?)

libcuda-compat is a strange thing ... Early cuda support in containers directly exported the GPU device, that was handled inside the container. Recent model (NVIDIA container toolkit) keeps the image agnostic in terms of GPU model and availability (in a CI/CD pipeline, many systems share the same image with different GPU). It is critical that the host running the container "injects" the right driver (mount) at runtime. So far, so good ... The strange fact was that, if compat was present in container, internal driver (520) took precedence over injected one (470)

One thing is still missing: BLAS/LAPACK support, ...

I tried to patch R/Rscript renviron to enable LD_PRELOAD trick to link nvblas/cublas library in front of standard OpenBlas (what about MKL?). The problem here is that GPU is not available during image build phase and all install.r calls generated a lot of warnings. Runtime inclusion of nvblas would be better but is rather complex ...

In terms of support lines, maybe a cuda-11 (470/520 driver compatible) base image line could be a nice addition, while the "latest and greatest" images could support cuda-12+ and driver 520 only

benz0li commented 1 year ago

Runtime inclusion of nvblas would be better but is rather complex ...

To provide NVBLAS-enabled R and Rscript:

cp -a $(which R) $(which R)_
echo '#!/bin/bash' > $(which R)
echo "command -v nvidia-smi >/dev/null && nvidia-smi -L | grep 'GPU[[:space:]]\?[[:digit:]]\+' >/dev/null && export LD_PRELOAD=libnvblas.so" >> $(which R)
echo "$(which R_) \"\${@}\"" >> $(which R)
cp -a $(which Rscript) $(which Rscript)_
echo '#!/bin/bash' > $(which Rscript)
echo "command -v nvidia-smi >/dev/null && nvidia-smi -L | grep 'GPU[[:space:]]\?[[:digit:]]\+' >/dev/null && export LD_PRELOAD=libnvblas.so" >> $(which Rscript)
echo "$(which Rscript_) \"\${@}\"" >> $(which Rscript)

👉 Enabled at runtime and only if nvidia-smi and at least one GPU are present.

(LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda/lib:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64)

Run some benchmarks to ensure that NVBLAS actually outperforms OpenBLAS.
ℹ️ In some of my setups it does not. That is why I decided to provide NVBLAS-enabled R_ and Rscript_ in addition to the default R and Rscript.

References:

hute37 commented 1 year ago

I found an issue in BLAS configuration.

In base nvidia-cuda image, OpenBLAS libraries, while installed, are disabled in /etc/alternatives configuration.

In the image, with nvblas/cublas enabled, system BLAS config is reset to basic (slow) libraries, but /etc/nvblas.conf wraps (fast) OpenBLAS version:

NVBLAS_CPU_BLAS_LIB /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3

To reset BLAS configuration I had to include in my setup:

update-alternatives --auto    libblas.so.3-x86_64-linux-gnu     
update-alternatives --auto    liblapack.so.3-x86_64-linux-gnu 

To check, (from container bash):

Rscript -e 'sessionInfo()' | grep -e 'BLAS' -e 'LAPACK'

References:

cboettig commented 1 year ago

@hute37 iirc, BLAS was intentionally turned off my default due to https://github.com/rocker-org/rocker-versioned2/issues/471 , which I believe was traced to an open issue with how either numpy or the openblas libraries handled suffixes on its symbols, see https://github.com/numpy/numpy/issues/21643 .

Although the numpy issue thread is still open, I believe that issue impacted only older libraries on Ubuntu 20.04 openblas, and that the newer openblas on 22.04 was not impacted. @hute37 would you be able to quickly test that, e.g. the reprex in https://github.com/rstudio/reticulate/issues/1190 no longer segfaults when you enable openblas?

@eitsupi Do you think we could turn openblas config back on by default for 22.04 cuda images while leaving it off for the 20.04 images?

eitsupi commented 1 year ago

@eitsupi Do you think we could turn openblas config back on by default for 22.04 cuda images while leaving it off for the 20.04 images?

Sure. I think we just need to add Ubuntu 20.04 to the conditions in the following section.

https://github.com/rocker-org/rocker-versioned2/blob/8279ff1f01eb1c9d58ee1a72f7821033253a4838/scripts/install_python.sh#L43-L50

hute37 commented 1 year ago

This test runs without any errors under this configuration:

> sessionInfo()

R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Rome
tzcode source: system (glibc)

...

> system('update-alternatives --display libblas.so.3-x86_64-linux-gnu; update-alternatives --display liblapack.so.3-x86_64-linux-gnu')
libblas.so.3-x86_64-linux-gnu - auto mode
  link best version is /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
  link currently points to /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
  link libblas.so.3-x86_64-linux-gnu is /usr/lib/x86_64-linux-gnu/libblas.so.3
/usr/lib/x86_64-linux-gnu/blas/libblas.so.3 - priority 10
/usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 - priority 100
liblapack.so.3-x86_64-linux-gnu - auto mode
  link best version is /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
  link currently points to /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
  link liblapack.so.3-x86_64-linux-gnu is /usr/lib/x86_64-linux-gnu/liblapack.so.3
/usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3 - priority 10
/usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3 - priority 100

> system('apt list --installed | grep -i -e blas -e lapack ')

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

libblas-dev/now 3.10.0-2ubuntu1 amd64 [installed,local]
libblas3/now 3.10.0-2ubuntu1 amd64 [installed,local]
libcublas-11-8/now 11.11.3.6-1 amd64 [installed,local]
libcublas-dev-11-8/now 11.11.3.6-1 amd64 [installed,local]
libgslcblas0/now 2.7.1+dfsg-3 amd64 [installed,local]
liblapack-dev/now 3.10.0-2ubuntu1 amd64 [installed,local]
liblapack3/now 3.10.0-2ubuntu1 amd64 [installed,local]
libopenblas-dev/now 0.3.20+ds-1 amd64 [installed,local]
libopenblas-pthread-dev/now 0.3.20+ds-1 amd64 [installed,local]
libopenblas0-pthread/now 0.3.20+ds-1 amd64 [installed,local]
libopenblas0/now 0.3.20+ds-1 amd64 [installed,local]

> system('inxi')
12CPU 6-core Intel Xeon E5-2690 v3 (-MCP-)  12speed  2597 MHz  12Kernel  5.15.0-1039-azure x86_64  12Up  6h 28m 
12Mem 12268.8/56218.3 MiB (21.8%) 12Storage 1.08 TiB (28.9% used) 12Procs 8

> system('python --version; pyenv --version; poetry --version')
Python 3.10.6
pyenv 2.3.18
Poetry (version 1.5.1)

> system('poetry show | grep -e ^numpy -e ^matplotlib -e ^pip -e ^setuptools')
matplotlib                    3.7.1         Python plotting package
matplotlib-inline             0.1.6         Inline Matplotlib backend for J...
numpy                         1.23.5        NumPy is the fundamental packag...
pip                           23.1          The PyPA recommended tool for i...
setuptools                    67.6.1        Easily download, build, install...

> 

I didn't tested with nvblas/cublas libraries ...

cboettig commented 1 year ago

Nice! Thanks @hute37 for testing and @eitsupi for the PR, great work!