kubernetes-sigs / wg-device-management

Prototypes and experiments for WG Device Management.
Apache License 2.0
7 stars 7 forks source link

Partitionable with common attributes #30

Open johnbelamaric opened 3 months ago

johnbelamaric commented 3 months ago

This is an evolution of the partitionable model defined in #27, which moves common attributes up to the pool level to reduce the size of the objects.

k8s-ci-robot commented 3 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnbelamaric

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubernetes-sigs/wg-device-management/blob/main/OWNERS)~~ [johnbelamaric] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
pohly commented 3 months ago

to reduce the size of the objects

That helps reduce the size on average, but for the worst-case analysis which determines the limits of the slices we have to assume that the new Attributes is fully-populated and all devices have the maximum number of attributes.

johnbelamaric commented 3 months ago

to reduce the size of the objects

That helps reduce the size on average, but for the worst-case analysis which determines the limits of the slices we have to assume that the new Attributes is fully-populated and all devices have the maximum number of attributes.

Yes. True.

johnbelamaric commented 3 months ago

I am making several options, see comment in #20

johnbelamaric commented 3 months ago

Option 4 has some nesting. That's #31. It is much more efficient than this one.

johnbelamaric commented 3 months ago

We could do more levels. Not clear the payoff is there.

johnbelamaric commented 3 months ago

Option 3 makes common attributes. Option 4 makes common attributes AND common partition "map".

klueska commented 3 months ago

I 100% agree we should have shared attributes. My first prototype had it, but at the time you said „let’s not prematurely optimize for size“ so we dropped it.

I’m still not sold on nesting though. I also had it originally (in my „recursive“ device model), but you all (rightly) convinced me to drop it. And now after working with flat devices and updating both the example driver and the NVIDIA GPU driver to adhere to them, I’m really happy with the flexibility that a flat model brings us.

I really don’t think it buys us much to have nesting and I have a strong feeling will come back to bite us fairly quickly.

thockin commented 3 months ago

I was continuing on the KEP PR, but detoured to these option, so I will copy a comment:

I just find it super weird to have gpu1's shared items consumable from gpu0. This is the thing that is setting me on edge. That implies to me a level of fungibility which doesn't exist. There is a grouping that is smaller than a ResourceSlice but bigger than a Device, and we are not modelling it. Call it a "card" for the moment. A slice doesn't have shared resources, a "card" does. Now you'll probably tell me "actually, one card can borrow resources on another card". In fact, I can already see the (hypothetical) use-case for a channelized <something> which can be effectively RAID'ed into a larger logical device. But that's not this, and (TTBOMK) that doesn't exist yet.

I really can see both sides, and I don't mean to be dogmatic. It just smells funny. Let's keep the conversation overall moving forward, and if this is all that's left, we can hash it out.

My first prototype had it, but at the time you said „let’s not prematurely optimize for size“ so we dropped it.

Yeah, we dropped a LOT to get the baseline, and bringing partitions back makes it clear to me that this is one piece that really does make sense.

k8s-triage-robot commented 2 weeks ago

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale