Closed johnbelamaric closed 3 months ago
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: johnbelamaric
The full list of commands accepted by this bot can be found here.
The pull request process is described here
Partionable devices, along with driver-side magic, can also support the idea of "compound devices". Here's how it would work.
Suppose we have two drivers, one for GPUs and one for NICs. We have nodes with 8 GPUs and 4 NICs. We want to allow certain "valid" combinations of these to be consumed as a unit. The "rules" here are:
This implies the following valid "molecules" for the triplet (gpu0, gpu1, nic0):
Similarly there are 5 valid combinations for other triplet.
So, what we do is create a "compound device" driver that runs on the node but acts as an intermediary between the K8s control plane and the drivers for the underlying devices. It contains an in-process mini API server that serves the ResourcePool API, and we point the GPU and NIC drivers at that local instance. The compound device driver uses those to construct a new compound pool on top of those drivers that follows the rules above, using this partitionable model:
apiVersion: resource.k8s.io/v1alpha3
kind: ResourcePool
metadata:
name: node0-compound0
spec:
driver: compound.example.com
nodeName: node0
devices:
- name: gpu0
sharedConsumed:
gpu0: 1
- name: gpu0-nic0
sharedConsumed:
gpu0: 1
nic0: 1
- name: gpu1
sharedConsumed:
gpu1: 1
- name: gpu1-nic0
sharedConsumed:
gpu1: 1
nic0: 1
- name: gpu0-gpu1-nic0
sharedConsumed:
gpu0: 1
gpu1: 1
nic0: 1
...
sharedConsumable:
- name: gpu0
capacity: 1
- name: gpu1
capacity: 1
- name: gpu2
capacity: 1
- name: gpu3
capacity: 1
- name: gpu4
capacity: 1
- name: gpu5
capacity: 1
- name: gpu6
capacity: 1
- name: gpu7
capacity: 1
- name: nic0
capacity: 1
- name: nic1
capacity: 1
- name: nic2
capacity: 1
- name: nic3
capacity: 1
The compound device driver is the only one that actually publishes anything to the K8s control plane. It is also what kubelet makes calls to, and it in turn calls down to the other drivers.
There are lots of details to work out for this, of course. For example, ideally users don't need to know they are using this intermediary, except maybe based on the class they choose. This would mean that the CEL-based attributes they use should still be the ones used by the underlying devices, rather than some that are particular to the compound device driver (which also may have some). For that, we may need to make sure that attributes are qualified, always, rather than allowing the short-hand of "unqualified means from the driver". Otherwise I can see a lot of confusion, especially during copy-and-paste situations.
There are also a few limitations:
I'm not sold on the complex-device scenario you proposed here, but I think we could iterate on that later. The more important thing is to agree on the API for partitionable devices, and I'm fairly happy with the naming / structure I proposed in my comment here: https://github.com/kubernetes-sigs/wg-device-management/pull/27/files#r1634768662
I'm not sold on the complex-device scenario you proposed here, but I think we could iterate on that later. The more important thing is to agree on the API for partitionable devices, and I'm fairly happy with the naming / structure I proposed in my comment here: https://github.com/kubernetes-sigs/wg-device-management/pull/27/files#r1634768662
Yeah, that is 100% on top of this without affecting what this looks like. It's something I want to prototype before too long - but it would be out-of-tree anyway :)
SGTM
The rationale for that:
Since this is "what's in the KEP" I think we should merge it and rebase all the options on it, so they appear as diffs. But I screwed up and LGTM'ed option 2 (#29) I don't hjave super on this repo, so I cannot manually fix
Ok, this matches the KEP. Merging.
This PR adds the minimal fields needed to support partitionable devices. A few notes for consideration:
SharedAllocatable
for the shared pooled resources, andSharedAllocatableConsumed
for the device values that consume items from the pool.SharedAllocatable
is a[]ResourceCapacity
which is a struct with just name and quantity. This leaves out BlockSize and IntRange stuff to keep it as simple as possible.SharedAllocatableConsumed
is amap[string]resource.Quantity
to mirrorPodSpec
requests. Given that this is the capacity model, not the claim model, consistency may not be needed here. In that case, we can probably change this to a struct instead, which would give us more room for expansion in the future.Since
SharedAllocatable
is directly inResourcePool
, that means each partitionable device needs to be its own pool. We could consider two other options:SharedAllocatable []AllocatableGroup } type AllocatableGroup struct { Name string Allocatable []ResourceCapacity } type Device struct { ... SharedAllocatableConsumed []ResourceRequest } type ResourceRequest struct { AllocatableGroupName string ResourceName string Quantity resource.Quantity }
AllocatableGroup
option as well, since it is incremental. I don't at this point like the last option much.