Open schwesig opened 4 months ago
when we talked about it and got some more ideas about it, this may need to be split into two issues later 1 for the metrics idea 1 for the cost allocation
Grooming discussion July 17:
Partitioning mode | Supported by nos |
Workload isolation level | Pros | Cons |
---|---|---|---|---|
Multi-instance GPU (MIG) | ✅ | Best |
|
|
Multi-process server (MPS) | ✅ | Medium |
|
|
Time-slicing | ❌ | None |
|
|
"nos is the open-source module to efficiently run AI workloads on Kubernetes, increasing GPU utilization, cutting down infrastructure costs and improving workloads performance."
We tested MIG slicing as a capability, but it is not offered yet as a service. It is not clear to me at all that any of our existing or forecast projects want anything less than a GPU dedicated to them. All the Red Hat requests so far are for multiple full GPUs. I don't know what is on the horizon for requests coming from BU or other academic users. Can @msdisme provide some kind of forecast for this? This will help us decide the priority of the work.
I suspect the work has to be done for each type of GPU that we are going to support. This looks like the observability and charging work could be quite significant, based on the rough info in this issue so far. As @schwesig indicated, let's break this up into the work to reflect GPU usage by project in observability first, and the billing effort later, because we need which we need the former even if we don't pursue "sliced" billing models later. Having a forecast of how many projects we believe will be satisfied with sliced GPUs, and how this will affect the MOC's charges (reducing them significantly from the current pre-emptive based GPU by 24 hr periods billing policy) is necessary to pursue the second batch of work efficiently.
Can we link this issue to the issue for adding GPU usage/billing for the dedicated bare-metal GPU case (allocated through ESI), which seems like it is in highest demand currently. There is no MIG option for the bare-metal case, so ultimately obeservability and billing will need to cover both cases? (If there isn't currently an issue for the bare-metal GPU case, please create one.)
I found sth today, maybe for some ideas
Story needs more details, feedback, research, but first approach to keep it in mind with this issue. please comment/feedback
This issue should create the awareness of
this makes a 2g.10gb more expensive than 1/4 of a 7g.40gb
source
/CC @hpdempsey @msdisme @joachimweyl