ublue-os / ucore

An OCI base image of Fedora CoreOS with batteries included
https://projectucore.io
Apache License 2.0
115 stars 23 forks source link

fix: symlink ldconfig to ldconfig.real for gpu-operator support #154

Open jeefy opened 2 months ago

jeefy commented 2 months ago

Currently Nvidia's GPU Operator expects ldconfig.real to exist. See https://github.com/NVIDIA/nvidia-container-toolkit/issues/147 for more info.

Short-term you can modify /usr/local/nvidia/toolkit/.config/nvidia-container-runtime/config.toml to point to /sbin/ldconfig however any time the pods cycle or the node reboots it regenerates the file and points to the incorrect ldconfig.

bsherman commented 2 months ago

Currently Nvidia's GPU Operator expects ldconfig.real to exist. See NVIDIA/nvidia-container-toolkit#147 for more info.

Short-term you can modify /usr/local/nvidia/toolkit/.config/nvidia-container-runtime/config.toml to point to /sbin/ldconfig however any time the pods cycle or the node reboots it regenerates the file and points to the incorrect ldconfig.

Odd bug... but reading the report, seems to be an artifact of Ubuntu-first packaging support.

Thank you for the contribution @jeefy, but as this is nvidia specific (at least, it seems to be), I want to scope it a bit more.

What are your thoughts on making it a post-install step for our ucore nvidia RPM? ( see: https://github.com/ublue-os/ucore-kmods/blob/main/ublue-os-ucore-nvidia.spec )

AN alternative though, maybe add the symlink with comment and explanation here? https://github.com/ublue-os/ucore/blob/main/ucore/install-ucore-minimal.sh#L46

Edit: added alternative thought