grgalex / nvshare

Practical GPU Sharing Without Memory Size Constraints
Apache License 2.0
229 stars 24 forks source link

Question: Comparison to Nvidia GPU Operator + GPU Feature Discovery #9

Open gadkins opened 1 year ago

gadkins commented 1 year ago

Apologies if Issues is the wrong place for my question, but I don't see a Discussions forum for this repo.

I've read your Medium article, which provides a nice summary of what problem nvshare is solving.

However, I also came across this blog from VMWare, which describes GPU virtualization in Kubernetes via Nvidia's GPU Operator and GPU Feature Discovery, which adds labels to the Nodes such as nvidia.com/vgpu.present=true and facilitates fractional allocation of GPUs to Pods.

How does nvshare differ and/or what additional value does it provide?

grgalex commented 1 year ago

@gadkins

The GPU Operator and Feature Discovery are auxiliary mechanisms that make it easier to manage GPUs in a K8s cluster. They just make life easier by automatically installing the Nvidia drivers and Device Plugin (in the case of the Operator) as well as automatically adding labels/taints to nodes with GPUs (in the case of the Feature Discovery thing).

AFAIK, Nvidia offers two mechanisms for sharing a GPU between multiple containers. These are exposed to a Kubernetes cluster through the official device plugin [1].

TL;DR

  1. Multi-instance GPU (MIG) requires special hardware and chops a GPU into disjoint pieces, each with its own memory and computation units. There is no sharing between the pieces. Each process/container exclusively uses one or more pieces. If you want a process to use the whole GPU, it gets all the slices, so there is no sharing.
  2. "Nvidia Device Plugin GPU Sharing" lets processes/containers loose on the same GPU. They can cause each other to go OOM. Quite chaotic.

To get an experience akin to (2), in nvshare, turn the nvshare-scheduler OFF through the CLI. This will use the default "CUDA black-box" scheduling.

Instead of OOM, processes may thrash the GPU instead.

1. MIG (Multi-Instance-GPU)

This requires special hardware (Ampere architecture GPUs). The GPU's hardware is segmented in a way that allows the driver to offer "true" splits of the GPU as independent devices.

You can skim through the official docs [2] for an overview on how that works.

2. "Nvidia Device Plugin GPU Sharing"

NVIDIA device plugin 0.12.0 officially provides an option to enable sharing a GPU between multiple containers (https://developer.nvidia.com/blog/improving-gpu-utilization-in-kubernetes/).

Memory is still the core problem:

Quoting them:

The tradeoffs with time-slicing are increased latency, jitter, and potential out-of-memory (OOM)
conditions when many different applications are time-slicing on the GPU.

This simply solves the 1-1 assignment on K8s and doesn't do anything to prevent OOM and friction between co-located apps.

I'll quote my thesis [3] (the abstract and first chapter are especially worth a read) on this very important distinction that we must always keep in mind when evaluating these alternative approaches:

While the problem of exclusive assignment of GPUs can be solved trivially
(for example by tweaking device-plugin to advertise a greater number of nvidia.com/gpu
than physical GPUs), the **CORE ISSUE** is that of managing the friction
between co-located tasks (how 2+ processes on the same node behave,
irrespective of Kubernetes) and that is hard to solve.

[1] https://github.com/NVIDIA/k8s-device-plugin [2] https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html [3] https://github.com/grgalex/nvshare/blob/main/grgalex-thesis.pdf

gadkins commented 1 year ago

Great answer! Thank you!

Ahhh, I did not realize that the Nvidia device plugin for GPU sharing does not gracefully handle fair-sharing of memory.