DRAFT: Propose an alternative to the types in capacity_types.go

klueska commented 2 months ago

I spent some time today going through the types in the file pkg/api/capacity_types.go in detail. I have proposed my changes to it as an entirely new file (instead of a diff to the original) so that we can more easily comment on it line-by-line without the diff getting in the way.

The biggest thing to note is the following: I'm coming around to the idea that DevicePool (or whatever future resource models we come up with) could be top level objects, rather than wrapping them in multiple-levels of indirection like we have today to support the NamedResources model:

type ResourceSlice struct {
    metav1.TypeMeta
    metav1.ObjectMeta

    NodeName string `json:"nodeName,omitempty" protobuf:"bytes,2,opt,name=nodeName"`
    DriverName string `json:"driverName" protobuf:"bytes,3,name=driverName"`
    ResourceModel `json:",inline" protobuf:"bytes,4,name=resourceModel"`
}

type ResourceModel struct {
    NamedResources *NamedResourcesResources
}

type NamedResourcesResources struct {
    Instances []NamedResourcesInstance `json:"instances" protobuf:"bytes,1,name=instances"`
}

Having all of these levels of indirection doesn't actually add much value. The original idea was that the scheduler (or whoever else cares about ResourceSlices) only has one object type to subscribe to, and we can simply extend its embedded ResourceModel with more types over time.

This actually makes it awkward, however, when we need to split our Instances at the lowest level across multiple API server objects. Each object needs to be wrapped in multiple levels of indirection, making it a somewhat hard to reason about when trying to reconstruct all objects from a given driver.

On the contrary, by making DevicePool (or whatever future resource models we come up with) a top-level API server object, we have a natural way of breaking things apart. Each pool is it's own self-contained object, so there is no ambiguity on where the object boundaries should be. After thinking about it for a while, this actually feels like the more natural and "Kubernetes-like" way of abstracting these objects.

The major difference being that now the scheduler can't just subscribe to a single ResourceSlices object and have all supported model implementations from all drivers follow that. It potentially has to subscribe to multiple top-level objects representing different models and react to them accordingly. This seems like a reasonable trade-off since one would always have to write the code to be aware of the different models anyway.

The second biggest thing to note is the following: I have (slightly) renamed and "future-proofed" all of the types that represent named resources and their requests. Whether we want to do this is still being debated, but I wanted to get a concrete proposal for what it would look like as a diff to @johnbelamaric's prototype, rather than continuing to talk in "what-ifs".

The third biggest thing to note is the following: We can now have a meaningful Status attached to the top-level type. In this case I have expanded it to include all allocated devices and the claims they have been allocated to.

k8s-ci-robot commented 2 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: klueska

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/kubernetes-sigs/wg-device-management/blob/main/OWNERS)~~ [klueska] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment

pohly commented 2 months ago

The original idea was that the scheduler (or whoever else cares about ResourceSlices) only has one object type to subscribe to, and we can simply extend its embedded ResourceModel with more types over time.

This is what https://github.com/kubernetes/kubernetes/issues/123699 depends upon: kubelet knows that drivers may access ResourceSlice and ResourceClaim through the REST proxy and denies access to everything else. When combining kubelet from 1.31 with a future DRA driver which wants to publish a "SomeFutureType" instead of a "ResourceSlice", the driver's request will get denied. I'm working on a KEP update for this issue and have a prototype implementation ready that I'll share together with it.

This actually makes it awkward, however, when we need to split our Instances at the lowest level across multiple API server objects. Each object needs to be wrapped in multiple levels of indirection, making it a somewhat hard to reason about when trying to reconstruct all objects from a given driver.

There's one additional indirection ("namedResources"), right? I don't find it hard to ignore that - but perhaps I am simply used to it. :shrug:

klueska commented 2 months ago

I think where it got awkward was when I was tryin to adopt @johnbelamaric*s model to fit into this:

type DeviceModel struct {
    metav1.TypeMeta
    metav1.ObjectMeta

    Spec   DeviceModelSpec
    Status DeviceModelStatus
}

type DeviceModelSpec struct {
    NodeName *string
    DriverName string 
    *DeviceModelVariant
}

struct DeviceModelVariant struct {
    DevicePools *DevicePools
    // Other future models
}

type DevicePools struct {
    Pools []DevicePool
}

type DevicePool struct {
...
}

These 5 structs turn into one (with multiple instances of the last as top-level objects).

klueska commented 2 months ago

I'm not saying we necessarily want to do this, but couldn't the kubelet be updated to whitlelist any new model objects we define (same as the scheduler needing to know to pull them in too)?

johnbelamaric commented 2 months ago

This is what kubernetes/kubernetes#123699 depends upon: kubelet knows that drivers may access ResourceSlice and ResourceClaim through the REST proxy and denies access to everything else. When combining kubelet from 1.31 with a future DRA driver which wants to publish a "SomeFutureType" instead of a "ResourceSlice", the driver's request will get denied. I'm working on a KEP update for this issue and have a prototype implementation ready that I'll share together with it.

Let's not forget we have versioning in the API. So, we start with v1 without the indirection. One option then is to add a new type; another option is to create DevicePool v2 which in turn has the indirection but roundtrips to v1. Kubelet can use an Unstructured for passthrough, so it can still check the types without changes and without versioning issues between apiserver/kubelet/driver. Unless I am missing something.

A couple questions as you think through that design:

Should kubelet enforce that pool.Driver == the one for the specified driver? Or even just set it, assuming the driver registers with kubelet?
Similarly, should the pool name be something the driver comes up with, or kubelet? Pool names coming from nodes (we could have ones that come from elsewhere) should generally be <node-name>-< something>. Are there exceptions to that? Should the driver or the kubelet control it?

pohly commented 2 months ago

Kubelet can use an Unstructured for passthrough, so it can still check the types without changes and without versioning issues between apiserver/kubelet/driver.

kubelet still needs to check whether a request for some unknown resource should be allowed. Let's discuss in https://github.com/kubernetes/enhancements/pull/4615

pohly commented 2 months ago

So, we start with v1 without the indirection. One option then is to add a new type; another option is to create DevicePool v2 which in turn has the indirection but roundtrips to v1.

This doesn't work without the one-of-many concept already being in v1. A new field can be added to v1 to make the round-trip possible, but a client built against v1 without that field will never see it. We have to define in advance how such a client can detect that it doesn't have some information that it should have.

kubernetes-sigs / wg-device-management

DRAFT: Propose an alternative to the types in capacity_types.go #8