kubernetes-sigs / wg-device-management

Prototypes and experiments for WG Device Management.
Apache License 2.0
4 stars 5 forks source link

dra-evolution: partitioning of devices #20

Open pohly opened 3 weeks ago

pohly commented 3 weeks ago

This was excluded from https://github.com/kubernetes-sigs/wg-device-management/pull/14 to limit the scope. It's a stretch goal for 1.31.

/assign @klueska @johnbelamaric

johnbelamaric commented 2 weeks ago

In the 1.31 KEP, we included the APIs defined in https://github.com/kubernetes-sigs/wg-device-management/pull/27, but Mrunal and Tim raised legitimate concerns with the ...verbosity... of that API.

I can think of a few alternatives we can debate, and will propose them as separate PRs in this repo.

cc @thockin @mrunalp @pohly @klueska

johnbelamaric commented 2 weeks ago

Here are some options. Each of the options 2+ are built on top of option 1.

I suggest looking at the file dra-evolution/testdata/pools-two-nodes-dgxa100.yaml in each PR. This is an example YAML for two 8 GPU servers based on the NVIDIA simulated devices. Real world will be similar, but probably add MORE attributes.

Option 1

Option 2

Option 3

Option 4

Option 5

Option 6

Option 7

johnbelamaric commented 2 weeks ago

FYI, I fixed the accidental merge of the wrong PR, and merged Option 1, which matches the KEP (except in the ResourcePool -> ResourceSlice naming).

I also then rebased all the other PRs on top of that. So, it's easier to see the deltas between the KEP and each of the options 1-4.