nerc-project / operations

Issues related to the operation of the NERC OpenShift environment
2 stars 0 forks source link

tb decided on: How to calculate GPUs when sliced #643

Open schwesig opened 4 months ago

schwesig commented 4 months ago

Story needs more details, feedback, research, but first approach to keep it in mind with this issue. please comment/feedback

This issue should create the awareness of

  1. usage of GPUs needs to be measured/judged in the context of the slicing 1.1. e.g. 50% usage of a full A100 GPU (7g.40gb) means sth different than 50% of a /4-mig sliced GPU (2g.10gb)
  2. the cost overhead 2.1. what cost structure is the basic (do we lease by board, GPU, sliced GPU?) 2.2. because: when we migslice a GPU, we create some unusable overhead 2.2.1. e.g. a full A100-40GB has 7g.40gb 2.2.2. a 2 times migsliced A100-40GB has 2x 3g.20gb (6g & 40gb -> -1g vs full) 2.2.3. a 4 times migsliced A100-40GB has 3! x 2g.10gb (6g & 30gb -> -1g & -10gb vs full) 2.2.4. an 8 times migsliced A100-40GB has 7x 1g.5GB (7g & 35gb -> -5gb vs full)

this makes a 2g.10gb more expensive than 1/4 of a 7g.40gb

image source

/CC @hpdempsey @msdisme @joachimweyl

schwesig commented 3 months ago

when we talked about it and got some more ideas about it, this may need to be split into two issues later 1 for the metrics idea 1 for the cost allocation

msdisme commented 3 months ago

Grooming discussion July 17:

schwesig commented 2 months ago

https://stackoverflow.com/questions/78653544/why-use-mps-time-slicing-or-mig-if-nvidias-defaults-have-better-performance image

schwesig commented 2 months ago

https://raw.githubusercontent.com/nebuly-ai/nos/main/docs/en/docs/dynamic-gpu-partitioning/partitioning-modes-comparison.md

Partitioning mode Supported by nos Workload isolation level Pros Cons
Multi-instance GPU (MIG) Best
  • Processes are executed in parallel
  • Full isolation (dedicated memory and compute resources)
  • Supported by fewer GPU models (only Ampere or more recent architectures)
  • Coarse-grained control over memory and compute resources
Multi-process server (MPS) Medium
  • Processes are executed parallel
  • Fine-grained control over memory and compute resources allocation
  • No error isolation and memory protection
Time-slicing None
  • Processes are executed concurrently
  • Supported by older GPU architectures (Pascal or newer)
  • No resource limits
  • No memory isolation
  • Lower performance due to context-switching overhead

"nos is the open-source module to efficiently run AI workloads on Kubernetes, increasing GPU utilization, cutting down infrastructure costs and improving workloads performance."

schwesig commented 2 months ago

https://www.infracloud.io/blogs/gpu-sharing-techniques-guide-vgpu-mig-time-slicing/ image

schwesig commented 2 months ago

https://github.com/nebuly-ai/nos/tree/main/demos/gpu-sharing-comparison image

hpdempsey commented 2 months ago

We tested MIG slicing as a capability, but it is not offered yet as a service. It is not clear to me at all that any of our existing or forecast projects want anything less than a GPU dedicated to them. All the Red Hat requests so far are for multiple full GPUs. I don't know what is on the horizon for requests coming from BU or other academic users. Can @msdisme provide some kind of forecast for this? This will help us decide the priority of the work.

I suspect the work has to be done for each type of GPU that we are going to support. This looks like the observability and charging work could be quite significant, based on the rough info in this issue so far. As @schwesig indicated, let's break this up into the work to reflect GPU usage by project in observability first, and the billing effort later, because we need which we need the former even if we don't pursue "sliced" billing models later. Having a forecast of how many projects we believe will be satisfied with sliced GPUs, and how this will affect the MOC's charges (reducing them significantly from the current pre-emptive based GPU by 24 hr periods billing policy) is necessary to pursue the second batch of work efficiently.

Can we link this issue to the issue for adding GPU usage/billing for the dedicated bare-metal GPU case (allocated through ESI), which seems like it is in highest demand currently. There is no MIG option for the bare-metal case, so ultimately obeservability and billing will need to cover both cases? (If there isn't currently an issue for the bare-metal GPU case, please create one.)

schwesig commented 1 month ago

I found sth today, maybe for some ideas