Open klueska opened 2 months ago
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: klueska
The full list of commands accepted by this bot can be found here.
The pull request process is described here
The original idea was that the scheduler (or whoever else cares about ResourceSlices) only has one object type to subscribe to, and we can simply extend its embedded ResourceModel with more types over time.
This is what https://github.com/kubernetes/kubernetes/issues/123699 depends upon: kubelet knows that drivers may access ResourceSlice and ResourceClaim through the REST proxy and denies access to everything else. When combining kubelet from 1.31 with a future DRA driver which wants to publish a "SomeFutureType" instead of a "ResourceSlice", the driver's request will get denied. I'm working on a KEP update for this issue and have a prototype implementation ready that I'll share together with it.
This actually makes it awkward, however, when we need to split our Instances at the lowest level across multiple API server objects. Each object needs to be wrapped in multiple levels of indirection, making it a somewhat hard to reason about when trying to reconstruct all objects from a given driver.
There's one additional indirection ("namedResources"), right? I don't find it hard to ignore that - but perhaps I am simply used to it. :shrug:
I think where it got awkward was when I was tryin to adopt @johnbelamaric*s model to fit into this:
type DeviceModel struct {
metav1.TypeMeta
metav1.ObjectMeta
Spec DeviceModelSpec
Status DeviceModelStatus
}
type DeviceModelSpec struct {
NodeName *string
DriverName string
*DeviceModelVariant
}
struct DeviceModelVariant struct {
DevicePools *DevicePools
// Other future models
}
type DevicePools struct {
Pools []DevicePool
}
type DevicePool struct {
...
}
These 5 structs turn into one (with multiple instances of the last as top-level objects).
I'm not saying we necessarily want to do this, but couldn't the kubelet be updated to whitlelist any new model objects we define (same as the scheduler needing to know to pull them in too)?
This is what kubernetes/kubernetes#123699 depends upon: kubelet knows that drivers may access ResourceSlice and ResourceClaim through the REST proxy and denies access to everything else. When combining kubelet from 1.31 with a future DRA driver which wants to publish a "SomeFutureType" instead of a "ResourceSlice", the driver's request will get denied. I'm working on a KEP update for this issue and have a prototype implementation ready that I'll share together with it.
Let's not forget we have versioning in the API. So, we start with v1 without the indirection. One option then is to add a new type; another option is to create DevicePool v2 which in turn has the indirection but roundtrips to v1. Kubelet can use an Unstructured
for passthrough, so it can still check the types without changes and without versioning issues between apiserver/kubelet/driver. Unless I am missing something.
A couple questions as you think through that design:
<node-name>-< something>
. Are there exceptions to that? Should the driver or the kubelet control it?Kubelet can use an Unstructured for passthrough, so it can still check the types without changes and without versioning issues between apiserver/kubelet/driver.
kubelet still needs to check whether a request for some unknown resource should be allowed. Let's discuss in https://github.com/kubernetes/enhancements/pull/4615
So, we start with v1 without the indirection. One option then is to add a new type; another option is to create DevicePool v2 which in turn has the indirection but roundtrips to v1.
This doesn't work without the one-of-many concept already being in v1. A new field can be added to v1 to make the round-trip possible, but a client built against v1 without that field will never see it. We have to define in advance how such a client can detect that it doesn't have some information that it should have.
I spent some time today going through the types in the file
pkg/api/capacity_types.go
in detail. I have proposed my changes to it as an entirely new file (instead of a diff to the original) so that we can more easily comment on it line-by-line without the diff getting in the way.The biggest thing to note is the following: I'm coming around to the idea that
DevicePool
(or whatever future resource models we come up with) could be top level objects, rather than wrapping them in multiple-levels of indirection like we have today to support theNamedResources
model:Having all of these levels of indirection doesn't actually add much value. The original idea was that the scheduler (or whoever else cares about
ResourceSlices
) only has one object type to subscribe to, and we can simply extend its embeddedResourceModel
with more types over time.This actually makes it awkward, however, when we need to split our
Instances
at the lowest level across multiple API server objects. Each object needs to be wrapped in multiple levels of indirection, making it a somewhat hard to reason about when trying to reconstruct all objects from a given driver.On the contrary, by making
DevicePool
(or whatever future resource models we come up with) a top-level API server object, we have a natural way of breaking things apart. Each pool is it's own self-contained object, so there is no ambiguity on where the object boundaries should be. After thinking about it for a while, this actually feels like the more natural and "Kubernetes-like" way of abstracting these objects.The major difference being that now the scheduler can't just subscribe to a single
ResourceSlices
object and have all supported model implementations from all drivers follow that. It potentially has to subscribe to multiple top-level objects representing different models and react to them accordingly. This seems like a reasonable trade-off since one would always have to write the code to be aware of the different models anyway.The second biggest thing to note is the following: I have (slightly) renamed and "future-proofed" all of the types that represent named resources and their requests. Whether we want to do this is still being debated, but I wanted to get a concrete proposal for what it would look like as a diff to @johnbelamaric's prototype, rather than continuing to talk in "what-ifs".
The third biggest thing to note is the following: We can now have a meaningful
Status
attached to the top-level type. In this case I have expanded it to include all allocated devices and the claims they have been allocated to.