metal-stack / metal-api

API to manage and control plane resources like machines, switches, operating system images, machine sizes, networks, IP addresses and more
GNU Affero General Public License v3.0
128 stars 9 forks source link

Initial GPU Support #512

Closed majst01 closed 5 months ago

majst01 commented 6 months ago

Dependend PRs:

We need to test a scenario where multiple pods need to access the same GPU:

There are several documents from nvidia on this topic

https://developer.nvidia.com/blog/improving-gpu-utilization-in-kubernetes https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-operator-mig.html https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-sharing.html