Open schrej opened 2 years ago
Since it's probably going to come up: if we're strictly following the official API conventions, we cannot make any incompatible changes, *even between api versions**, since we moved to v1beta1
. I find this to be very limiting, and also do not really understand what the purpose of having multiple API versions is, if you're not allowed to change things between them. I get that forward compatibility is nice, but beta versions are still different versions, and allow to introduce breaking changes gracefully through conversion.
I would prefer to use the alpha/beta/stable differentiation as an indicator of how long a version will be supported and how significant changes to it will be, instead of whether it will have breaking changes at all. Introducing breaking changes should still be possible at beta level, but there needs to be full conversion support and the previous beta version should be supported for a long enough time for everyone to migrate (e.g. 1 year). Fully forward compatible changes (aka. adding optional or defaultable fields) do not even require a version change, so why have multiple beta versions at all? Only to indicate how stable and well-tested the implementation behind the API is?
TL;DR: If we don't introduce forward-compatibility breaking changes with v1beta2
, we should move directly to v1
and continue with work on v2alpha1
.
I'm +1 in removing unnecessary fields when moving to v1beta2.
However at the same time it seems to me we are trying to tackle a generic problem in API modeling so:
So I think I'm going to go against the grain here (seems @vincepri and @fabriziopandini would prefer to use existing types) and provide the arguments I've heard for not using the existing types (which @schrej has already done for the most part and @deads2k did in the PR referenced above), which can then perhaps influence some conversations about which of these we care about and which we don't:
As a user, I can kubectl explain
the API and I will see in there, fields such as ResourceVersion
and Field
, which I may then set, expecting them to do something (note they have comments which end up in the API schema explaining what they should do).
It is bad practice to have fields in an API which don't do anything, it's confusing to end users so we should avoid this where possible
If we import something like LocalObjectReference
from corev1
, we have no control over that type. At some point (while unlikely), that type may gain a new field. Once we revendor the API repo (or someone using our API revendors it within their project), suddenly we have gained a new field and we are back to issue (1) where we have a field in our API that means nothing to our controllers.
We should avoid using external types so that we can control the fields in our API. We can never guarantee that an external type won't change.
As in point (1), this goes back to the end user experience. Users interact with openapi schemas in general via either generated docs or kubectl explain
. If we own the type, we own the documentation. This means we can make specific comments in that API to explain to a user how this field should be used, and what to expect if they set it, or don't, and which values might be expected as well.
We can also add domain specific validations if we so desire.
This also means that, while the idea of generic object references reduces boilerplate, in general you should create very focused reference types which are used only within a particularly focused part of the API.
The combination of 2 and 3 means we shouldn't create generic reference types, for example, if we had a capiv1.ObjectReference
that was being used in a InfraCluster
and an InfraMachine
, and then one of these types needs to add a new field to its ObjectReference
, technically that is allowed, but, you would now have to add it to both of these types, rather than just the one type, again, leading back to problem (1).
Depending on your definition of the boundary of your API compatibility, you could argue that as long as the resource serializes correctly and, to an end user (who interacts with YAML/JSON) it hasn't broken, then in theory you could change one of these types to some new reference without causing issues. But, you Go consumers would need to be updated to meet the new type. If you have limited Go consumers, that's an easy change to own, I suspect for CAPI we would want to avoid this.
type ObjectReference struct { Group string `json:"group"` Kind string `json:"kind"` Name string `json:"name"` Namespace string `json:"namespace"` }
@schrej, you may be aware of this already, but just in case, in each of the examples you've given here, you use a Kind
field. This has become discouraged as Kind
could be ambiguous, while Resource
is guaranteed to be unique (and resource to kind is injective, kind to resource is not)
https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#kind-vs-resource
IMO this thread is quite one sided at the moment, we have a lot of arguments for not using the core types, I think it would be good to get some balance on the arguments against as well
We should try to avoid having to maintain additional types
Since these types are only used as part of our API resources, and not as individual ones, I don't think it's a big deal to add them.
Could we expand on why this is? What are the issues with maintaining additional types?
It is confusing/impractical for users to have to deal with our custom types
Likewise, can we expand on this? Who are our users here? Are we talking about people coding Go to use our API, or are we talking about people writing YAML? Because I think they are very different types of users and, as per previous arguments, I think actually this makes YAML writing easier.
I can see for the Go consumers it means that common utils for corev1
types are no longer usable, but perhaps we can solve these issues with some polymorphism within our code?
Could we expand on why this is? What are the issues with maintaining additional types?
I don't see any issue with that.
It is confusing/impractical for users to have to deal with our custom types
Likewise, can we expand on this? Who are our users here? Are we talking about people coding Go to use our API, or are we talking about people writing YAML? Because I think they are very different types of users and, as per previous arguments, I think actually this makes YAML writing easier.
The actual types are only relevant for Go users/consumers in my opinion. If you're only using YAML to interact, you won't even notice that they change if the serialisation stays identical, as you've already mentioned. That argument came up during the discussion at KubeCon. I see two reasons why it can be difficult for Go users:
(4) Types should be scoped to their specific usage
So you would implement one type per reference instead? So way more types than the current proposal? I don't disagree with that Idea, but I'm not sure if it's necessary. It'll also make it harder to use those types internally.
Thanks for pointing out that API doc section, I didn't know about it (or forgot that it exists).
If we add methods to convert to and from those types, this also shouldn't be a big deal, and just comes down to figuring out that those functions and methods exist. If we want to convert to those types internally, we'll have to implement them anyway as long as we still support current API versions.
I think this really depends on what we are using. If you are converting into a bigger or smaller type, it may not be lossless, in which case, those functions may not work as expected. I would expect to maintain a set of helpers that do what we need to do, we could even make them work with interfaces if we wanted to, to make things a bit easier/more reusable. There's a few ways we can tackle this
So you would implement one type per reference instead? So way more types than the current proposal? I don't disagree with that Idea, but I'm not sure if it's necessary. It'll also make it harder to use those types internally.
I think this is the technically correct way to do it, otherwise we open ourselves up for potential future issues, though as I mentioned, if you don't care about Go API compatibility, you can evolve the Go API later. It does make them harder to use if you are working with lots of helpers yes, but I think there's ways around it
As with all of this, we need to decide on our boundaries. Some people will want to be super strict about the conventions, others will not. How much we care as a community is an open discussion. I think it's interesting to discuss the pros and cons of each approach
Hey folks, catching up on this conversation, wanted to clarify some thoughts from my side and potentially try to find a path forward.
From my perspective, using the core/v1 types has always been a preferred path for consistency with the rest of the ecosystem of controllers and to provide a familiar way to reference resources to users (old and new).
Given the limitation we've put in place for object references, the core/v1 types are probably not the best fit.
/milestone Next
Next steps discussed at the community meeting:
/triage accepted
@sbueringer to comment about recent discovery on this topic
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/lifecycle frozen
@sbueringer to comment about recent discovery on this topic
Still on my list, just low priority compared to a lot of other things...
I'm looking into building out a new bootstrap and controlplane provider implementation and would like to see this issue resolved before I complete the API for it. Is there anything I can do to help move this long?
IMO, we should introduce new references that are well scoped to the current usage and observe current conventions
type InfrastructureObjectReference struct {
Group string
Resource string
Name string
}
This would also then mean we could drop the current patching of the version in the apiVersion (which could lead to SSA issues) and would be able to just look up and use the preferred version for each API group. And we would no longer have to enforce the namespace field, as it's redundant anyway.
The tricky part, I would hazard a guess at, is conversion. Namespace field we drop, so that's fine. Resource becomes Kind/vice versa, can be mapped with a rest mapper in theory. Group to apiVersion, again, should be possible with a restmapper.
When is the best time to raise this to make progress? Are we planning a new API version any time?
@JoelSpeed I agree with your proposal in general. Can you point me to the reasoning behind using Resource instead of Kind? I sort of remember that upstream is doing this, but from a user perspective I'm not sure if resource is something that folks are familiar with, while Kind is very well known.
@schrej, you may be aware of this already, but just in case, in each of the examples you've given here, you use a Kind field. This has become discouraged as Kind could be ambiguous, while Resource is guaranteed to be unique (and resource to kind is injective, kind to resource is not) https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#kind-vs-resource
From one of my earlier comments in the thread
Just to surface it here. Had a quick discussion with Joel about Kind vs Resource at KubeCon and I think it's fair to summarize that we want to take a closer look if it's really necessary in our case to use resource instead of kind. If it's not, it's probably better for our users as they are usually more familiar with what a kind is (it is in every Kubernetes YAML) and we can avoid the migration effort from kind to resource
/priority important-longterm
Note: This basically a reincarnation of the original topic of #2318. Since that issue has a lot of comments already, and a lot of them are just bot commands, I wanted to start over with a summary of what was discussed at KubeCon EU 22.
Motivation
New uses of
corev1.ObjectReference
are discouraged since about two years now. In my opinion we should go even further and try to get rid ofcorev1.ObjectReference
completely. This is technically a breaking change (but it should be discussed how severe it actually is), so we at least have to wait untilcorev1beta2
. Even though it might take a while until we get there, we should start preparing for it now by either deciding on which existing types we want to use, or which custom reference types we want to add to the API.We should also try to come up with a somewhat exhaustive lists of the types we need, and add all of the relevant types, even if they are unused for now. This avoids having multiple separate discussions when the need for a new type comes up, and makes designing a consistent API much easier. In addition, new APIs can directly choose the appropriate reference type, making adoption much simpler.
There are several reasons not to use
corev1.ObjectReference
(some also mentioned in the comment discouraging the use of it):ResourceVersion
andFieldPath
entirely. In addition we usually do not respectUID
. While the latter might be useful in a status, it's usually not necessary to have.Namespace
field in a lot of cases. In fact, it even requires to write a lot of code just to keep it. Whenever resources are only allowed to reference objects in the same namespace, we have defaulting and validation code in place to insure this (I think there are few webhooks that do only this). For uses this is only apparent when applying the resources to the API, leading to a bad UX. In the code, the field is usually ignored, as the namespace matches the object anyway. In case the validating webhooks are not working, this could lead to unexpected behaviour.APIVersion
withgroup
. This becomes especially relevant when trying to optimise for GitOps: When there is a version in the reference, that version has to be upgraded in Git whenever the API version of the referenced object changes. Since the idea of serving multiple versions is that the client can decide which version to use, we should do so, and get rid of it where we can.Proposal
I propose to add 6 new reference types. In addition, we should add functions and methods to convert between them and to and from
corev1.ObjectReference
. This way the types are easier to adopt, both within cluster-api, as well as by providers and other tools.Alternatives
@vincepri mentioned that he would prefer using other core types if possible, instead of implementing our own. Currently there are three other types available:
LocalObjectReference
- only withname
TypedLocalObjectReference
- only withname
,apigroup
andkind
SecretReference
- only withname
andnamespace
Only the
TypedLocalObjectReference
seems to cover our needs, as we require typed references in most cases due to the modularity of Cluster API. We might be able to use theLocalObjectReference
in some cases, e.g. for the References toCluster
.Iirc the main motivation to use
corev1
types was twofold:corev1.ObjectReference
, but ignore most parameters (without documentation). By using custom types, utilities can accept structs that only contain fields that they support. Using the conversion functions it is easy to get acorev1.ObjectReference
when required, or convert from one. Users have to import our API package anyway, so they have the conversion methods and functions available if they are part of that package.Is replacing existing references a breaking change?
Yes. We are removing fields from the API that were present before, so it certainly is a breaking API change.
But I'd like to argue that the impact of that breaking change is so small, that we can justify to make it anyway, at least when moving to v1beta2.
First of all, the change does not remove any features. In fact, it does not change any behavior at all. All removed fields are either completely unused, made useless by validation (namespaces) or not very useful (version part of apiversion). Replacing
apiversion
withgroup
is the only thing that makes this change really noticeable as it requires a small amount of manual migration. If we stick withapiversion
the change won't be noticed by most users when using yaml or json.In code you will of course have to replace object references with the new custom types. But converting to the newer API will require changing the import path, and having to change a few struct names is not that much effort. Even when
corev1.ObjectReference
is used heavily, the conversion functions make it easy to convert between them, and it can be documented which fields will be dropped, making it much clearer what is going on.The only case where it does actually have an impact, is when users use fields of the references for their own purposes, e.g. reconciling them using a custom controller. In addition, users that explicitly set the namespace field will need to stop doing so.
/area api /kind cleanup /kind api-change