Open gadkins opened 1 year ago
@gadkins
The GPU Operator and Feature Discovery are auxiliary mechanisms that make it easier to manage GPUs in a K8s cluster. They just make life easier by automatically installing the Nvidia drivers and Device Plugin (in the case of the Operator) as well as automatically adding labels/taints to nodes with GPUs (in the case of the Feature Discovery thing).
AFAIK, Nvidia offers two mechanisms for sharing a GPU between multiple containers. These are exposed to a Kubernetes cluster through the official device plugin [1].
To get an experience akin to (2), in
nvshare
, turn thenvshare-scheduler
OFF through the CLI. This will use the default "CUDA black-box" scheduling.Instead of OOM, processes may thrash the GPU instead.
This requires special hardware (Ampere architecture GPUs). The GPU's hardware is segmented in a way that allows the driver to offer "true" splits of the GPU as independent devices.
You can skim through the official docs [2] for an overview on how that works.
NVIDIA device plugin 0.12.0 officially provides an option to enable sharing a GPU between multiple containers (https://developer.nvidia.com/blog/improving-gpu-utilization-in-kubernetes/).
DEFAULT, SHORT, MEDIUM, LONG
])[nvidia.com/gpu](http://nvidia.com/gpu)
devices for every physical GPUMemory is still the core problem:
Quoting them:
The tradeoffs with time-slicing are increased latency, jitter, and potential out-of-memory (OOM)
conditions when many different applications are time-slicing on the GPU.
This simply solves the 1-1 assignment on K8s and doesn't do anything to prevent OOM and friction between co-located apps.
I'll quote my thesis [3] (the abstract and first chapter are especially worth a read) on this very important distinction that we must always keep in mind when evaluating these alternative approaches:
While the problem of exclusive assignment of GPUs can be solved trivially
(for example by tweaking device-plugin to advertise a greater number of nvidia.com/gpu
than physical GPUs), the **CORE ISSUE** is that of managing the friction
between co-located tasks (how 2+ processes on the same node behave,
irrespective of Kubernetes) and that is hard to solve.
[1] https://github.com/NVIDIA/k8s-device-plugin [2] https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html [3] https://github.com/grgalex/nvshare/blob/main/grgalex-thesis.pdf
Great answer! Thank you!
Ahhh, I did not realize that the Nvidia device plugin for GPU sharing does not gracefully handle fair-sharing of memory.
Apologies if Issues is the wrong place for my question, but I don't see a Discussions forum for this repo.
I've read your Medium article, which provides a nice summary of what problem
nvshare
is solving.However, I also came across this blog from VMWare, which describes GPU virtualization in Kubernetes via Nvidia's GPU Operator and GPU Feature Discovery, which adds labels to the Nodes such as
nvidia.com/vgpu.present=true
and facilitates fractional allocation of GPUs to Pods.How does
nvshare
differ and/or what additional value does it provide?