Open pohly opened 3 weeks ago
In the 1.31 KEP, we included the APIs defined in https://github.com/kubernetes-sigs/wg-device-management/pull/27, but Mrunal and Tim raised legitimate concerns with the ...verbosity... of that API.
I can think of a few alternatives we can debate, and will propose them as separate PRs in this repo.
cc @thockin @mrunalp @pohly @klueska
Here are some options. Each of the options 2+ are built on top of option 1.
I suggest looking at the file dra-evolution/testdata/pools-two-nodes-dgxa100.yaml
in each PR. This is an example YAML for two 8 GPU servers based on the NVIDIA simulated devices. Real world will be similar, but probably add MORE attributes.
Option 1
gpu-0-memory-block-0
)Option 2
Option 3
Option 4
Option 5
Option 6
Option 7
FYI, I fixed the accidental merge of the wrong PR, and merged Option 1, which matches the KEP (except in the ResourcePool -> ResourceSlice naming).
I also then rebased all the other PRs on top of that. So, it's easier to see the deltas between the KEP and each of the options 1-4.
This was excluded from https://github.com/kubernetes-sigs/wg-device-management/pull/14 to limit the scope. It's a stretch goal for 1.31.
/assign @klueska @johnbelamaric