siderolabs / extensions

Talos Linux System Extensions
89 stars 87 forks source link

Feature request: add nvproxy to gVisor if Nvidia driver is included #413

Open stevefan1999-personal opened 2 weeks ago

stevefan1999-personal commented 2 weeks ago

gVisor should have out of the box support for most GPU workload

The guide can be found here: https://gvisor.dev/docs/user_guide/gpu/

stevefan1999-personal commented 2 weeks ago

Looks like the situation is a little more complicated. Technically speaking if you have the driver installed on Talos as an extension and configured k8s-device-plugin to use CDI, and then let either NFD discover the GPU or add the specified label forcefully, then k8s-device-plugin should be able to populate CDI manifest and update the node with GPU allocation information, and inject the needed driver into the container during pod creation. Therefore, the following config to the helm chart of k8s-device-plugin should work:

deviceListStrategy: cdi-cri

Then according to https://github.com/google/gvisor/issues/9368#issuecomment-2040992201, gVisor should be able to load up the CDI manifest from k8s-device-plugin and inject the devices to the sentry.

But it is not sure whether CDI is going to work in Talos in the first place.