kubernetes-sigs / wg-device-management

Prototypes and experiments for WG Device Management.
Apache License 2.0
7 stars 7 forks source link

Minimal changes for partitionable devices in DRA evolution prototype #27

Closed johnbelamaric closed 3 months ago

johnbelamaric commented 3 months ago

This PR adds the minimal fields needed to support partitionable devices. A few notes for consideration:

k8s-ci-robot commented 3 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnbelamaric

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubernetes-sigs/wg-device-management/blob/main/OWNERS)~~ [johnbelamaric] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
johnbelamaric commented 3 months ago

Partionable devices, along with driver-side magic, can also support the idea of "compound devices". Here's how it would work.

Suppose we have two drivers, one for GPUs and one for NICs. We have nodes with 8 GPUs and 4 NICs. We want to allow certain "valid" combinations of these to be consumed as a unit. The "rules" here are:

This implies the following valid "molecules" for the triplet (gpu0, gpu1, nic0):

Similarly there are 5 valid combinations for other triplet.

So, what we do is create a "compound device" driver that runs on the node but acts as an intermediary between the K8s control plane and the drivers for the underlying devices. It contains an in-process mini API server that serves the ResourcePool API, and we point the GPU and NIC drivers at that local instance. The compound device driver uses those to construct a new compound pool on top of those drivers that follows the rules above, using this partitionable model:

apiVersion: resource.k8s.io/v1alpha3
kind: ResourcePool
metadata:
  name: node0-compound0
spec:
  driver: compound.example.com
  nodeName: node0
  devices:
  - name: gpu0
    sharedConsumed:
      gpu0: 1
  - name: gpu0-nic0
    sharedConsumed:
      gpu0: 1
      nic0: 1
  - name: gpu1
    sharedConsumed:
      gpu1: 1
  - name: gpu1-nic0
    sharedConsumed:
      gpu1: 1
      nic0: 1
  - name: gpu0-gpu1-nic0
    sharedConsumed:
      gpu0: 1
      gpu1: 1
      nic0: 1
...
  sharedConsumable:
  - name: gpu0
    capacity: 1
  - name: gpu1
    capacity: 1
  - name: gpu2
    capacity: 1
  - name: gpu3
    capacity: 1
  - name: gpu4
    capacity: 1
  - name: gpu5
    capacity: 1
  - name: gpu6
    capacity: 1
  - name: gpu7
    capacity: 1
  - name: nic0
    capacity: 1
  - name: nic1
    capacity: 1
  - name: nic2
    capacity: 1
  - name: nic3
    capacity: 1

The compound device driver is the only one that actually publishes anything to the K8s control plane. It is also what kubelet makes calls to, and it in turn calls down to the other drivers.

There are lots of details to work out for this, of course. For example, ideally users don't need to know they are using this intermediary, except maybe based on the class they choose. This would mean that the CEL-based attributes they use should still be the ones used by the underlying devices, rather than some that are particular to the compound device driver (which also may have some). For that, we may need to make sure that attributes are qualified, always, rather than allowing the short-hand of "unqualified means from the driver". Otherwise I can see a lot of confusion, especially during copy-and-paste situations.

There are also a few limitations:

klueska commented 3 months ago

I'm not sold on the complex-device scenario you proposed here, but I think we could iterate on that later. The more important thing is to agree on the API for partitionable devices, and I'm fairly happy with the naming / structure I proposed in my comment here: https://github.com/kubernetes-sigs/wg-device-management/pull/27/files#r1634768662

johnbelamaric commented 3 months ago

I'm not sold on the complex-device scenario you proposed here, but I think we could iterate on that later. The more important thing is to agree on the API for partitionable devices, and I'm fairly happy with the naming / structure I proposed in my comment here: https://github.com/kubernetes-sigs/wg-device-management/pull/27/files#r1634768662

Yeah, that is 100% on top of this without affecting what this looks like. It's something I want to prototype before too long - but it would be out-of-tree anyway :)

johnbelamaric commented 3 months ago

SGTM

pohly commented 3 months ago

The rationale for that:

thockin commented 3 months ago

Since this is "what's in the KEP" I think we should merge it and rebase all the options on it, so they appear as diffs. But I screwed up and LGTM'ed option 2 (#29) I don't hjave super on this repo, so I cannot manually fix

johnbelamaric commented 3 months ago

Ok, this matches the KEP. Merging.