schrej commented 2 years ago

Note: This basically a reincarnation of the original topic of #2318. Since that issue has a lot of comments already, and a lot of them are just bot commands, I wanted to start over with a summary of what was discussed at KubeCon EU 22.

Motivation

New uses of corev1.ObjectReference are discouraged since about two years now. In my opinion we should go even further and try to get rid of corev1.ObjectReference completely. This is technically a breaking change (but it should be discussed how severe it actually is), so we at least have to wait until corev1beta2. Even though it might take a while until we get there, we should start preparing for it now by either deciding on which existing types we want to use, or which custom reference types we want to add to the API.

We should also try to come up with a somewhat exhaustive lists of the types we need, and add all of the relevant types, even if they are unused for now. This avoids having multiple separate discussions when the need for a new type comes up, and makes designing a consistent API much easier. In addition, new APIs can directly choose the appropriate reference type, making adoption much simpler.

There are several reasons not to use corev1.ObjectReference (some also mentioned in the comment discouraging the use of it):

We ignore both ResourceVersion and FieldPath entirely. In addition we usually do not respect UID. While the latter might be useful in a status, it's usually not necessary to have.
We do not need the Namespace field in a lot of cases. In fact, it even requires to write a lot of code just to keep it. Whenever resources are only allowed to reference objects in the same namespace, we have defaulting and validation code in place to insure this (I think there are few webhooks that do only this). For uses this is only apparent when applying the resources to the API, leading to a bad UX. In the code, the field is usually ignored, as the namespace matches the object anyway. In case the validating webhooks are not working, this could lead to unexpected behaviour.
In a lot of cases we could replace APIVersion with group. This becomes especially relevant when trying to optimise for GitOps: When there is a version in the reference, that version has to be upgraded in Git whenever the API version of the referenced object changes. Since the idea of serving multiple versions is that the client can decide which version to use, we should do so, and get rid of it where we can.

Proposal

I propose to add 6 new reference types. In addition, we should add functions and methods to convert between them and to and from corev1.ObjectReference. This way the types are easier to adopt, both within cluster-api, as well as by providers and other tools.

type ObjectReference struct {
    Group string `json:"group"`
    Kind string `json:"kind"`
    Name string `json:"name"`
    Namespace string `json:"namespace"`
}

// Add these for all types
func NewObjectReference(ref v1.ObjectReference) ObjectReference {}
func (ObjectReference) ToV1Reference() v1.ObjectReference

func (LocalObjectReference) ToNamespaced(namespace string) ObjectReference {}
func (ObjectReference) ToLocal() LocalObjectReference {}

type LocalObjectReference struct {
    Group string `json:"group"`
    Kind string `json:"kind" `
    Name string `json:"name"`
}

func (PinnedObjectReference) Unpin() ObjectReference {}
func (ObjectReference) Pin(uuid string) PinnedObjectReference {}

type PinnedObjectReference struct {
    Group string `json:"group"`
    Kind string `json:"kind"`
    Name string `json:"name"`
    Namespace string `json:"namespace"`
    UID types.UID `json:"uid"`
}

func (PinnedLocalObjectReference) Unpin() LocalObjectReference {}
func (LocalObjectReference) Pin(uuid string) PinnedLocalObjectReference {}

type PinnedLocalObjectReference struct {
    Group string `json:"group"`
    Kind string `json:"kind"`
    Name string `json:"name"`
    UID types.UID `json:"uid"`
}

func (VersionedLocalObjectReference) ToUnversioned() LocalObjectReference {}
func (LocalObjectReference) ToVersioned(version string) VersionedLocalObjectReference {}

type VersionedLocalObjectReference struct {
    Group string `json:"group"`
    Version string `json:"version"`
    Kind string `json:"kind" `
    Name string `json:"name"`
}

Alternatives

@vincepri mentioned that he would prefer using other core types if possible, instead of implementing our own. Currently there are three other types available:

LocalObjectReference - only with name
TypedLocalObjectReference - only with name, apigroup and kind
SecretReference - only with name and namespace

Only the TypedLocalObjectReference seems to cover our needs, as we require typed references in most cases due to the modularity of Cluster API. We might be able to use the LocalObjectReference in some cases, e.g. for the References to Cluster.

Iirc the main motivation to use corev1 types was twofold:

We should try to avoid having to maintain additional types Since these types are only used as part of our API resources, and not as individual ones, I don't think it's a big deal to add them.
It is confusing/impractical for users to have to deal with our custom types For the aforementioned reasons I think the custom types improve user experience, and at the same time make our code more maintainable and easier to follow. Right now a lot of internal tools accept corev1.ObjectReference, but ignore most parameters (without documentation). By using custom types, utilities can accept structs that only contain fields that they support. Using the conversion functions it is easy to get a corev1.ObjectReference when required, or convert from one. Users have to import our API package anyway, so they have the conversion methods and functions available if they are part of that package.

Is replacing existing references a breaking change?

Yes. We are removing fields from the API that were present before, so it certainly is a breaking API change.

But I'd like to argue that the impact of that breaking change is so small, that we can justify to make it anyway, at least when moving to v1beta2.

First of all, the change does not remove any features. In fact, it does not change any behavior at all. All removed fields are either completely unused, made useless by validation (namespaces) or not very useful (version part of apiversion). Replacing apiversion with group is the only thing that makes this change really noticeable as it requires a small amount of manual migration. If we stick with apiversion the change won't be noticed by most users when using yaml or json.

In code you will of course have to replace object references with the new custom types. But converting to the newer API will require changing the import path, and having to change a few struct names is not that much effort. Even when corev1.ObjectReference is used heavily, the conversion functions make it easy to convert between them, and it can be documented which fields will be dropped, making it much clearer what is going on.

The only case where it does actually have an impact, is when users use fields of the references for their own purposes, e.g. reconciling them using a custom controller. In addition, users that explicitly set the namespace field will need to stop doing so.

/area api /kind cleanup /kind api-change

schrej commented 2 years ago

Since it's probably going to come up: if we're strictly following the official API conventions, we cannot make any incompatible changes, *even between api versions**, since we moved to v1beta1. I find this to be very limiting, and also do not really understand what the purpose of having multiple API versions is, if you're not allowed to change things between them. I get that forward compatibility is nice, but beta versions are still different versions, and allow to introduce breaking changes gracefully through conversion.

I would prefer to use the alpha/beta/stable differentiation as an indicator of how long a version will be supported and how significant changes to it will be, instead of whether it will have breaking changes at all. Introducing breaking changes should still be possible at beta level, but there needs to be full conversion support and the previous beta version should be supported for a long enough time for everyone to migrate (e.g. 1 year). Fully forward compatible changes (aka. adding optional or defaultable fields) do not even require a version change, so why have multiple beta versions at all? Only to indicate how stable and well-tested the implementation behind the API is?

TL;DR: If we don't introduce forward-compatibility breaking changes with v1beta2, we should move directly to v1 and continue with work on v2alpha1.

dlipovetsky commented 2 years ago

6228 was a PR I opened that tried to address the lack of defaults for the namespace field in the Infra and Bootstrap references in MachineDeployments. I closed it, because the complexity of the fix would outweigh its benefits. However, if we move away from ObjectReferences, we both address the issue and reduce overall complexity.

fabriziopandini commented 2 years ago

I'm +1 in removing unnecessary fields when moving to v1beta2.

However at the same time it seems to me we are trying to tackle a generic problem in API modeling so:

I'm in favor of using core types vs creating custom ones.
Looping into @JoelSpeed who recently invested in this area, might be he can give some guidance.
I would suggest opening an issue in K/K or asking in api-machinery if there is already a response to the question of providing better support for GitOps in reference fields (TBH, I'm not sure if the problem is modeling or CAPI behavior of upgrading references to ensure all the resources are compliant to the same contract)

JoelSpeed commented 2 years ago

So I think I'm going to go against the grain here (seems @vincepri and @fabriziopandini would prefer to use existing types) and provide the arguments I've heard for not using the existing types (which @schrej has already done for the most part and @deads2k did in the PR referenced above), which can then perhaps influence some conversations about which of these we care about and which we don't:

(1) The existing types have additional fields that mean nothing in our API (I think everyone agress on this)

As a user, I can kubectl explain the API and I will see in there, fields such as ResourceVersion and Field, which I may then set, expecting them to do something (note they have comments which end up in the API schema explaining what they should do).

It is bad practice to have fields in an API which don't do anything, it's confusing to end users so we should avoid this where possible

(2) If you rely on an imported type, it may change

If we import something like LocalObjectReference from corev1, we have no control over that type. At some point (while unlikely), that type may gain a new field. Once we revendor the API repo (or someone using our API revendors it within their project), suddenly we have gained a new field and we are back to issue (1) where we have a field in our API that means nothing to our controllers.

We should avoid using external types so that we can control the fields in our API. We can never guarantee that an external type won't change.

(3) We can document our types with context of their actual usage

As in point (1), this goes back to the end user experience. Users interact with openapi schemas in general via either generated docs or kubectl explain. If we own the type, we own the documentation. This means we can make specific comments in that API to explain to a user how this field should be used, and what to expect if they set it, or don't, and which values might be expected as well. We can also add domain specific validations if we so desire.

This also means that, while the idea of generic object references reduces boilerplate, in general you should create very focused reference types which are used only within a particularly focused part of the API.

(4) Types should be scoped to their specific usage

The combination of 2 and 3 means we shouldn't create generic reference types, for example, if we had a capiv1.ObjectReference that was being used in a InfraCluster and an InfraMachine, and then one of these types needs to add a new field to its ObjectReference, technically that is allowed, but, you would now have to add it to both of these types, rather than just the one type, again, leading back to problem (1).

Depending on your definition of the boundary of your API compatibility, you could argue that as long as the resource serializes correctly and, to an end user (who interacts with YAML/JSON) it hasn't broken, then in theory you could change one of these types to some new reference without causing issues. But, you Go consumers would need to be updated to meet the new type. If you have limited Go consumers, that's an easy change to own, I suspect for CAPI we would want to avoid this.

type ObjectReference struct {
  Group string `json:"group"`
  Kind string `json:"kind"`
  Name string `json:"name"`
  Namespace string `json:"namespace"`
}

@schrej, you may be aware of this already, but just in case, in each of the examples you've given here, you use a Kind field. This has become discouraged as Kind could be ambiguous, while Resource is guaranteed to be unique (and resource to kind is injective, kind to resource is not) https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#kind-vs-resource

IMO this thread is quite one sided at the moment, we have a lot of arguments for not using the core types, I think it would be good to get some balance on the arguments against as well

We should try to avoid having to maintain additional types

Since these types are only used as part of our API resources, and not as individual ones, I don't think it's a big deal to add them.

Could we expand on why this is? What are the issues with maintaining additional types?

It is confusing/impractical for users to have to deal with our custom types

Likewise, can we expand on this? Who are our users here? Are we talking about people coding Go to use our API, or are we talking about people writing YAML? Because I think they are very different types of users and, as per previous arguments, I think actually this makes YAML writing easier.

I can see for the Go consumers it means that common utils for corev1 types are no longer usable, but perhaps we can solve these issues with some polymorphism within our code?

schrej commented 2 years ago

Could we expand on why this is? What are the issues with maintaining additional types?

I don't see any issue with that.

It is confusing/impractical for users to have to deal with our custom types

Likewise, can we expand on this? Who are our users here? Are we talking about people coding Go to use our API, or are we talking about people writing YAML? Because I think they are very different types of users and, as per previous arguments, I think actually this makes YAML writing easier.

The actual types are only relevant for Go users/consumers in my opinion. If you're only using YAML to interact, you won't even notice that they change if the serialisation stays identical, as you've already mentioned. That argument came up during the discussion at KubeCon. I see two reasons why it can be difficult for Go users:

figuring out which type has to be used where With modern tooling that provides good autocompletion i don't think that this is a big deal
using the types in conjunction with other components or libraries might be hard, as they are likely to expect corev1 types as the standard If we add methods to convert to and from those types, this also shouldn't be a big deal, and just comes down to figuring out that those functions and methods exist. If we want to convert to those types internally, we'll have to implement them anyway as long as we still support current API versions.

(4) Types should be scoped to their specific usage

So you would implement one type per reference instead? So way more types than the current proposal? I don't disagree with that Idea, but I'm not sure if it's necessary. It'll also make it harder to use those types internally.

Thanks for pointing out that API doc section, I didn't know about it (or forgot that it exists).

JoelSpeed commented 2 years ago

If we add methods to convert to and from those types, this also shouldn't be a big deal, and just comes down to figuring out that those functions and methods exist. If we want to convert to those types internally, we'll have to implement them anyway as long as we still support current API versions.

I think this really depends on what we are using. If you are converting into a bigger or smaller type, it may not be lossless, in which case, those functions may not work as expected. I would expect to maintain a set of helpers that do what we need to do, we could even make them work with interfaces if we wanted to, to make things a bit easier/more reusable. There's a few ways we can tackle this

So you would implement one type per reference instead? So way more types than the current proposal? I don't disagree with that Idea, but I'm not sure if it's necessary. It'll also make it harder to use those types internally.

I think this is the technically correct way to do it, otherwise we open ourselves up for potential future issues, though as I mentioned, if you don't care about Go API compatibility, you can evolve the Go API later. It does make them harder to use if you are working with lots of helpers yes, but I think there's ways around it

As with all of this, we need to decide on our boundaries. Some people will want to be super strict about the conventions, others will not. How much we care as a community is an open discussion. I think it's interesting to discuss the pros and cons of each approach

vincepri commented 2 years ago

Hey folks, catching up on this conversation, wanted to clarify some thoughts from my side and potentially try to find a path forward.

From my perspective, using the core/v1 types has always been a preferred path for consistency with the rest of the ecosystem of controllers and to provide a familiar way to reference resources to users (old and new).

Given the limitation we've put in place for object references, the core/v1 types are probably not the best fit.

Is Cluster API the only project that needs better references? Can API Machinery help us reviewing and adding new types to the current set of references?
Can we still retain json-compatibility between newly introduced types and the core/v1 ObjectReference? Is still necessary?
If we add new types to Cluster API, I'd prefer to have a single reference object, not one for each type which can become quickly unmaintainable; it also raises the bar when coding against the APIs. Consistency of both developer and user experience is a key fundamental that we should maintain.

vincepri commented 2 years ago

/milestone Next

vincepri commented 2 years ago

Next steps discussed at the community meeting:

Reach out to API Machinery folks to explain our use cases and issues with the current types (@schrej)

fabriziopandini commented 2 years ago

/triage accepted

@sbueringer to comment about recent discovery on this topic

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

fabriziopandini commented 1 year ago

/lifecycle frozen

sbueringer commented 1 year ago

@sbueringer to comment about recent discovery on this topic

Still on my list, just low priority compared to a lot of other things...

JoelSpeed commented 1 year ago

I'm looking into building out a new bootstrap and controlplane provider implementation and would like to see this issue resolved before I complete the API for it. Is there anything I can do to help move this long?

IMO, we should introduce new references that are well scoped to the current usage and observe current conventions

type InfrastructureObjectReference struct {
  Group string
  Resource string
  Name string
}

This would also then mean we could drop the current patching of the version in the apiVersion (which could lead to SSA issues) and would be able to just look up and use the preferred version for each API group. And we would no longer have to enforce the namespace field, as it's redundant anyway.

The tricky part, I would hazard a guess at, is conversion. Namespace field we drop, so that's fine. Resource becomes Kind/vice versa, can be mapped with a rest mapper in theory. Group to apiVersion, again, should be possible with a restmapper.

When is the best time to raise this to make progress? Are we planning a new API version any time?

sbueringer commented 1 year ago

@JoelSpeed I agree with your proposal in general. Can you point me to the reasoning behind using Resource instead of Kind? I sort of remember that upstream is doing this, but from a user perspective I'm not sure if resource is something that folks are familiar with, while Kind is very well known.

JoelSpeed commented 1 year ago

@schrej, you may be aware of this already, but just in case, in each of the examples you've given here, you use a Kind field. This has become discouraged as Kind could be ambiguous, while Resource is guaranteed to be unique (and resource to kind is injective, kind to resource is not) https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#kind-vs-resource

From one of my earlier comments in the thread

sbueringer commented 12 months ago

Just to surface it here. Had a quick discussion with Joel about Kind vs Resource at KubeCon and I think it's fair to summarize that we want to take a closer look if it's really necessary in our case to use resource instead of kind. If it's not, it's probably better for our users as they are usually more familiar with what a kind is (it is in every Kubernetes YAML) and we can avoid the migration effort from kind to resource

fabriziopandini commented 7 months ago

/priority important-longterm

kubernetes-sigs / cluster-api

prepare replacing corev1.ObjectReference in v1beta2 #6539

Motivation

Proposal

Alternatives

Is replacing existing references a breaking change?

(1) The existing types have additional fields that mean nothing in our API (I think everyone agress on this)

(2) If you rely on an imported type, it may change

(3) We can document our types with context of their actual usage

(4) Types should be scoped to their specific usage

We should try to avoid having to maintain additional types

It is confusing/impractical for users to have to deal with our custom types