ublue-os / hwe

Fedora variants with support for ASUS devices, Nvidia devices, and Surface laptops
https://universal-blue.org/images/hwe
Apache License 2.0
169 stars 36 forks source link

cuda toolkit not installed for user #198

Open 81reap opened 7 months ago

81reap commented 7 months ago

Steps To Recreate

  1. Perform a clean install of bazzite-nvidia.
  2. Login as the user.
  3. Check for cuda by running nvcc --version. It will fail to find the command.

Expected Behavior

rpm-ostree and nvidia-smi show that cuda and cuda toolkit should be installed, however nvcc --version fails to work.

reap@fedora:~$ nvidia-smi
Thu Feb 22 18:58:12 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.06              Driver Version: 545.29.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX 4000 SFF Ada ...    Off | 00000000:01:00.0 Off |                  Off |
| 30%   33C    P8               5W /  70W |      2MiB / 20475MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

reap@fedora:~$ rpm -qa | grep nvidia
nvidia-gpu-firmware-20240115-2.fc39.noarch
ublue-os-nvidia-addons-0.10-1.fc39.noarch
xorg-x11-drv-nvidia-cuda-libs-545.29.06-2.fc39.x86_64
nvidia-modprobe-545.29.06-1.fc39.x86_64
nvidia-persistenced-545.29.06-1.fc39.x86_64
nvidia-container-toolkit-base-1.14.5-1.x86_64
libnvidia-container1-1.14.5-1.x86_64
libnvidia-container-tools-1.14.5-1.x86_64
nvidia-container-toolkit-1.14.5-1.x86_64
xorg-x11-drv-nvidia-kmodsrc-545.29.06-2.fc39.x86_64
libva-nvidia-driver-0.0.11-1.fc39.x86_64
xorg-x11-drv-nvidia-libs-545.29.06-2.fc39.i686
xorg-x11-drv-nvidia-libs-545.29.06-2.fc39.x86_64
nvidia-settings-545.29.06-1.fc39.x86_64
xorg-x11-drv-nvidia-power-545.29.06-2.fc39.x86_64
kmod-nvidia-6.7.5-201.fsync.fc39.x86_64-545.29.06-3.fc39.x86_64
xorg-x11-drv-nvidia-545.29.06-2.fc39.x86_64
xorg-x11-drv-nvidia-cuda-libs-545.29.06-2.fc39.i686
xorg-x11-drv-nvidia-cuda-545.29.06-2.fc39.x86_64
xorg-x11-drv-nvidia-devel-545.29.06-2.fc39.x86_64

reap@fedora:~$ nvcc --version
# only works after the workaround

Hardware

B550I Aurus Pro AX AMD Ryzen 7 5700G Nvidia RTX 4000 SFF Ada Gen 2x32GB @ 3200 MHz 2TB NVME Drive

Setup Notes

The Workaround

note :: The workaround does not fix the issue for podman containers running with CDI. Any cuda required workloads will have to be run in the userspace.

note :: you may need to change the cuda version in these commands. See here

$ nvidia-smi
# this shows the correct output and says that cuda 12.3 is installed
$ nvcc --version
# this should fail to find nvcc
$ ls /etc/local
# this output does not contain cuda which confirms that the cuda toolkit is not installed

$ wget https://developer.download.nvidia.com/compute/cuda/12.3.2/local_installers/cuda_12.3.2_545.23.08_linux.run
$ sudo sh cuda_12.3.2_545.23.08_linux.run
# this will require you to accept the licence first. You should only be installing the cuda drivers as the system already has nvidia drivers.
$ ls /etc/local
# now we have the cuda toolkit, but nvcc will still fail as it is not on your path

# add this to your ~/.bashrc so that it is loaded every boot
$ export PATH=/usr/local/cuda-12.3/bin${PATH:+:${PATH}}
$ export LD_LIBRARY_PATH=/usr/local/cuda-12.3/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
$ nvcc --version 
# nvcc now works

Related Issues

h7io commented 3 months ago

I am attempting to get gstreamer's cudaconvert and cudascale pipeline elements working that also depend on the cuda toolkit.

To list the elements available I run gst-inspect-1.0 nvcodec, which currently returns:

> gst-inspect-1.0 nvcodec
Plugin Details:
  Name                     nvcodec
  Description              GStreamer NVCODEC plugin
  Filename                 /usr/lib64/gstreamer-1.0/libgstnvcodec.so
  Version                  1.24.4
  License                  LGPL
  Source module            gst-plugins-bad
  Documentation            https://gstreamer.freedesktop.org/documentation/nvcodec/
  Source release date      2024-05-29
  Binary package           Fedora GStreamer-plugins-bad package
  Origin URL               http://download.fedoraproject.org

  Info: CUDA runtime compilation library "libnvrtc.so" was not found, check CUDA toolkit package installation

  cudadownload: CUDA downloader
  cudaipcsink: CUDA IPC Sink
  cudaipcsrc: CUDA IPC Src
  cudaupload: CUDA uploader
  nvautogpuh264enc: NVENC H.264 Video Encoder Auto GPU select Mode
  nvautogpuh265enc: NVENC H.265 Video Encoder Auto GPU select Mode
  nvav1dec: NVDEC AV1 Decoder
  nvcudah264enc: NVENC H.264 Video Encoder CUDA Mode
  nvcudah265enc: NVENC H.265 Video Encoder CUDA Mode
  nvh264dec: NVDEC H.264 Decoder
  nvh264enc: NVENC H.264 Video Encoder
  nvh265dec: NVDEC H.265 Decoder
  nvh265enc: NVENC HEVC Video Encoder
  nvjpegdec: NVDEC jpeg Video Decoder
  nvmpeg2videodec: NVDEC mpeg2video Video Decoder
  nvmpeg4videodec: NVDEC mpeg4video Video Decoder
  nvmpegvideodec: NVDEC mpegvideo Video Decoder
  nvvp8dec: NVDEC VP8 Decoder
  nvvp9dec: NVDEC VP9 Decoder

  19 features:
  +-- 19 elements

i.e. cudaconvert and cudascale are missing from the list because of the error Info: CUDA runtime compilation library "libnvrtc.so" was not found, check CUDA toolkit package installation.

After doing the workaround from the top comment, I see that libnvrtc.so is available at /usr/local/cuda/targets/x86_64-linux/lib, but gstreamer is still not finding it, adding the exports to bashrc didn't help.

EDIT: cudaconvert and cudascale started appearing in that list, maybe rm -rf ~/.cache/gstreamer-1.0/ helped.

mubinulhaque commented 1 month ago

For those who are attempting to do this now, you should know that you have to use the version of CUDA that is appropriate for both your GPU driver's version and the version of CUDA that your GPU supports.

For example, as of writing, the GTX 1650 SUPER supports CUDA 12.6 and is on driver version 560.35.03. That means, you have to use:

wget https://developer.download.nvidia.com/compute/cuda/12.6.1/local_installers/cuda_12.6.1_560.35.03_linux.run

Keep in mind that, if your GPU supports 12.6, then it supports any 12.6.x version, and thus you should download the highest 12.6.x version you can get. You should format the rest of the commands as follows:

sudo sh cuda_12.6.1_560.35.03_linux.run export PATH=/usr/local/cuda-12.6/bin${PATH:+:${PATH}} export LD_LIBRARY_PATH=/usr/local/cuda-12.6/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

You may notice that last two commands don't include 12.6.1, but instead they include 12.6. This is normal, you should not include any number after the second decimal point.

Like @81reap says, you should add the last two commands to your .bashrc file. This is located in your Home directory in Dolphin File Explorer. If you can't see it, enable Show Hidden Files.

Remember: this is different for each GPU, use nvidia-smi to find your GPU driver's version and its CUDA support, so you can adapt the above instructions for your hardware.