intel / intel-device-plugins-for-kubernetes

Collection of Intel device plugins for Kubernetes
Apache License 2.0
35 stars 203 forks source link

Options for GPU Sharing between Containers Running on a Workstation #1769

Open frenchwr opened 3 months ago

frenchwr commented 3 months ago

Describe the support request Hello, I'm trying to understand options that would allow multiple containers to share a single GPU.

I see that K8s device plugins in general are not meant to allow a device to be shared between containers.

I also see from the GPU plugin docs in this repo that there is a sharedDevNum that can be used for sharing a GPU, but I infer this is partitioning the resources on the GPU so each container is only allocated a fraction of the GPU's resources. Is that correct?

My use case is a tool called data-science-stack that is being built to automate the deployment/management of GPU-enabled containers for quick AIML experimentation on a user's laptop or workstation. In this scenario we'd prefer the containers have the ability to each have access to the full GPU resources - much like you'd expect for applications running directly on the host. Is this possible?

System (please complete the following information if applicable):

eero-t commented 3 months ago

sharedDevNum is mostly intended to be used when either:

[1] GPU Aware Scheduling: https://github.com/intel/platform-aware-scheduling/tree/master/gpu-aware-scheduling

In this scenario we'd prefer the containers have the ability to each have access to the full GPU resources

If each container (in cluster) is supposed to have exclusive access to the GPU device, use 1 for sharedDevNum.

frenchwr commented 3 months ago

If each container (in cluster) is supposed to have exclusive access to the GPU device, use 1 for sharedDevNum.

But this does not allow the GPU to be shared between containers, correct?

Maybe a bit more context about the use case would help. We are building an application that simplifies the deployment of GPU-enabled containers (for example, using Intel's ITEX and IPEX images). This is not meant for deployments across a clusters of nodes. There is just a single node (user's laptop or workstation).

Each container runs a Jupyter Notebook server. Ideally, a user could be on a workstation with a single GPU and multiple containers running, with each provided full access to the GPU. Notebook workloads are typically very bursty, so container A may run a notebook cell that is very GPU intensive while container B is idle. In cases where both containers are simultaneously requesting GPU acceleration, ideally that would be handled the same way (or close to the same way) as two applications running directly on the host OS requesting GPU resources.

tkatila commented 3 months ago

@frenchwr sharedDevNum is the option you would most likely want. Any container requesting the gpu.intel.com/i915 resource with sharedDevNum > 1 will get "unlimited" access to the GPU. "unlimited" in a sense that there's no hard partitioning etc. in works. Obviously if two containers try to run on the same GPU, they will fight for the same resources (execution time, memory).

frenchwr commented 3 months ago

@tkatila Thanks for clarifying! I agree this sounds like the way to go. A few more follow up questions:

tkatila commented 3 months ago
  • Am I right that running with resource management disabled (default behavior) would make the most sense for our use case?

Yes, that's correct, keep it disabled. To enable resource management you would also need another k8s component (GPU Aware Scheduling, or GAS). It's setup requires some hassle and I don't see any benefit from it in your case.

  • Is there any performance impact on the number you choose for sharedDevNum? For example, using 2 vs. 10 vs. 100. I guess not if there is no partitioning of the GPU resources but just want to confirm. Is there any reason not to choose an arbitrarily large number if our goal is to expose the full GPU to each container?

I don't think we have any guide for selecting the number, but something between 10 and 100 would be fine. The downside with an extremely large number is that it might incur some extra CPU and network bandwidth utilization. GPU plugin will detect the number of GPUs, multiply the number with the sharedDevNum and then create duplicate resources for the node. Carrying all those resources in resource registration and during scheduling will have some minor effect, but if the sharedDevNum is within a sensible range, the effect shouldn't be noticeable.