knative / eventing

Event-driven application platform for Kubernetes
https://knative.dev/docs/eventing
Apache License 2.0
1.39k stars 582 forks source link

[Experimental] `KReference.Group` to avoid being explicit on the `APIVersion` #5086

Open slinkydeveloper opened 3 years ago

slinkydeveloper commented 3 years ago

Description I propose to add a new field to KReference, called Group. For example, in order to refer to a KafkaChannel, more than using this one:

Spec:
  Channel:
    API Version:  messaging.knative.dev/v1alpha1
    Kind:         KafkaChannel
    Name:         testchannel

The user will be able to just use this:

Spec:
  Channel:
    Group:  messaging.knative.dev
    Kind:         KafkaChannel
    Name:         testchannel

This approach brings the following benefits:

When the api version is needed, an algorithm can just retrieve the latest for the provided api group and kind and then query the api server using such version.

Background This came out while working on the KafkaChannel v1alpha1 removal: https://github.com/knative-sandbox/eventing-kafka/issues/426#issuecomment-800991695. This upgrade ended up being quite painful and we even got it wrong with brokers https://github.com/knative-sandbox/eventing-kafka/issues/624.

Not tieing to the api version seems to me logically more correct, because the api version is a detail of the resource. With the reference system we have today, every time we need to remove an old api, we'll have to ask users to accept some downtime to fix the api version manually, or we need to end up with hacks like https://github.com/knative/eventing/pull/5085/files#diff-74513e8fb938944cdb1b786c568c843a282eecd46ba5440a4374763245e2ffeaR30. Both are in my opinion unacceptable.

Exit Criteria Users are now able to use references without explicitly define the api version, but just using the api group.

Experimental flag name: kreference-group

Experimental feature stages plan

Below the proposed plan for the feature stages (this list implicitly includes the requirements defined in the process)

Affected WG

Prior discussions

matzew commented 3 years ago

Not tieing to the api version also seems more logically correct, because the api version is a detail of the resource.

it sounds like a limitation in the first place... but I am not sure if ignoring the version is a good idea... b/c they shapes could be completely different (e.g. v1alpha1 versus v2)

/cc @vaikas

slinkydeveloper commented 3 years ago

b/c they shapes could be completely different (e.g. v1alpha1 versus v2)

But the shape ain't nothing to do with the fact that I'm referring to A or B or C, right?

vaikas commented 3 years ago

Yeah, the shape is different so you need to know which version of the resource you're getting so you can patch the right resource. Also, knowing the version is important to set up the informers, etc. otherwise you're going to have racy time.

slinkydeveloper commented 3 years ago

Yeah, the shape is different so you need to know which version of the resource you're getting so you can patch the right resource.

Can't we discover that while fetching the resource? And isn't the storage version one anyway for every CRD?

vaikas commented 3 years ago

Yes, that's why the label on the resource indicated which duck it conforms to. I don't quite understand what the storage has to do with it, but yes, there can only be one storage version for a given CRD.

slinkydeveloper commented 3 years ago

I don't quite understand what the storage has to do with it, but yes, there can only be one storage version for a given CRD.

And that's exactly my point. When the user gives us:

Spec:
  Channel:
    API Version:  messaging.knative.dev
    Kind:         KafkaChannel
    Name:         testchannel

The duck controller can discover from kube what is the storage version of messaging.knative.dev KafkaChannel, and then the controller can just query the stored versions to perform the changes on the resource instance. It doesn't need to know the "original version" of the crd, it won't make any difference to it.

E.g. in this case, the controller finds out v1beta1 is the stored version, so it will just use messaging.knative.dev/v1beta1 KafkaChannel to modify the subscribers list

grantr commented 3 years ago

This is going to be a tough one to fix since the reference shape is foundational to a lot of our apis.

slinkydeveloper commented 3 years ago

Please check out the https://github.com/knative/eventing/pull/5131

aliok commented 3 years ago

5131 looks good to me.

The PR does something only if the version is missing the in the subscription ref. That is backward compatible.

I don't think this would require us to change any other code to adapt the code here.

slinkydeveloper commented 3 years ago

BTW, how hard would it be to use latest served version instead of storage version? There can be multiple served versions and they're strings. Some semver magic?

I honestly don't wanna dig in the semver magic :smile: I also think is more "correct" to use the storage version because:

n3wscott commented 3 years ago

An apiVersion without version is not an apiVersion.

5131 is breaking the contract we have with the resources. The subscription should just go ready false and an operator should fix it. No need for magic here.

matzew commented 3 years ago

Can you share details on what you mean with "an operator should fix it" ?

slinkydeveloper commented 3 years ago

Looping in eventing wg leads @lionelvillard @grantr @vaikas @devguyio how should we move this one forward? Is this a good candidate as experimental feature (since it's a behavioural change of a field?)

omerbensaadon commented 3 years ago

FWIW, this is cleaner UX

grantr commented 3 years ago

My concerns with this feature are technical. The k8s API requires a version identifier to access a resource, so we have to deduce what the version is, and there may be situations in which the version cannot be deduced or the deduction is incorrect. If the technical difficulties can be worked out, then I agree this change would result in an improved UX.

The current behavior must always be supported.

IMO allowing the APIVersion field to omit the version is going to cause issues because it's harder to know what the user's intent is, and APIVersion everywhere else in the k8s API means "Group + Version". Instead, I'd prefer to add an APIGroup field to the reference type (though in Go, this should probably be implemented as a separate type rather than modifying the existing KRef type) because that's a clearer statement of what the field is expected to contain, and a clearer statement about the user's intent. I could be convinced I'm wrong based on actual prototypes and usage, so this is a recommendation, not a mandate.

Practically, I'm concerned that adding this fundamental experiment to eventing core now will disrupt the 1.0 spec and conformance work, which for now is a Knative-wide priority. I suggest working out the technical details and UX in an experimental feature of an alternate broker implementation rather than eventing core itself.

vaikas commented 3 years ago

@slinkydeveloper Thanks for nudging us and making sure this gets eyeballs, and our apologies for taking so long to find the time to make sure we bring our concerns out in a researched manner.

TL;DR:

@n3wscott and I just spent a bunch of time going through the deep bowels of the codes and the helpers and what various parsers that k8s ecosystem (including dynamic clients expect, etc. etc. etc.).

Just to make sure we're all on the same page and we have concrete examples, bear with this lengthy note and revisiting some basic concepts :)

APIVersion has a very specific and well understood meaning (just like @grantr points out above) but unfortunately also has legacy issues (for example) Service looks like this in ObjectReference: {Kind:Service Namespace:test-embssksj Name:t0-fucnyczu UID: APIVersion:v1 ResourceVersion: FieldPath:}

However Broker looks like this: {Kind:Broker Namespace:test-embssksj Name:routing-test-6 UID: APIVersion:eventing.knative.dev/v1 ResourceVersion: FieldPath:}

And Deployment looks like this: {Kind:Deployment Namespace:test-embssksj Name:t0-fucnyczu UID: APIVersion:apps/v1 ResourceVersion: FieldPath:}

So, given these, and if for example you use a standard way to parse these (like this one below) this would then mean that we'd have to special case around the special case due to k8s 'core' special handling. https://github.com/kubernetes/apimachinery/blob/57f2a0733447cfd41294477d833cce6580faaca3/pkg/runtime/schema/group_version.go#L209-L227

func ParseGroupVersion(gv string) (GroupVersion, error) {
    // this can be the internal version for the legacy kube types
    // TODO once we've cleared the last uses as strings, this special case should be removed.
    if (len(gv) == 0) || (gv == "/") {
        return GroupVersion{}, nil
    }
    switch strings.Count(gv, "/") {
    case 0:
        return GroupVersion{"", gv}, nil
    case 1:
        i := strings.Index(gv, "/")
        return GroupVersion{gv[:i], gv[i+1:]}, nil
    default:
        return GroupVersion{}, fmt.Errorf("unexpected GroupVersion string: %v", gv)
    }
}

(I'm sure the TODO will be removed any day now ;) )

Also as part of the learnings with duck types and their usage in anger has lead to. https://github.com/knative-sandbox/discovery/blob/main/pkg/apis/discovery/v1alpha1/clusterducktype_types.go#L155-L180

We (and sounds like @grantr) feel the best way forward is probably to not overload the semantic meaning of APIVersion, but to add Group (which is again a k8s standard concept) field into the KReference. We might need to special case the validation of KReference for resources whose controllers do this lookup.

This still leaves the problem of how do we get the Version, and we can use the logic in this PR to do that for the time being and see how that goes. But in the future it would be better to utilize some of the Discovery work for better ducks support and without us redefining the well understood k8s fields / names / semantics. The whole goal of the Discovery work is to solve the mapping of duck version to versions of that resource implementing that duck in a more generic as well as more discoverable manner overall.

And as @aliok points out we do have the annotation to tell us the duck shape once it's fetched but the discovery would allow us to not rely on the annotations, but will work fine until discovery lands in all it's fullness.

slinkydeveloper commented 3 years ago

The k8s API requires a version identifier to access a resource, so we have to deduce what the version is, and there may be situations in which the version cannot be deduced or the deduction is incorrect. This still leaves the problem of how do we get the Version, and we can use the logic in this PR to do that for the time being and see how that goes. But in the future it would be better to utilize some of the Discovery work for better ducks support and without us redefining the well understood k8s fields / names / semantics. The whole goal of the Discovery work is to solve the mapping of duck version to versions of that resource implementing that duck in a more generic as well as more discoverable manner overall.

I personally don't see how there could be any difficulties in that sense, because whatever version the user pushed, the version living in the server is only one. And as you pointed out, we have the annotation to get the duck type version, hence from my understanding this seems to me an orthogonal problem, not related to this particular issue, but due the fact that k8s allows storing a CR only with a specific version. At the end of the day, this issue is exploiting this particular aspect of CRDs design.

But I'm happy to be proven wrong and I'm open to analyze situations where this version resolution breaks.

The current behavior must always be supported.

+1

IMO allowing the APIVersion field to omit the version is going to cause issues because it's harder to know what the user's intent is, and APIVersion everywhere else in the k8s API means "Group + Version". Instead, I'd prefer to add an APIGroup field to the reference type (though in Go, this should probably be implemented as a separate type rather than modifying the existing KRef type) because that's a clearer statement of what the field is expected to contain, and a clearer statement about the user's intent. We (and sounds like @grantr) feel the best way forward is probably to not overload the semantic meaning of APIVersion, but to add Group (which is again a k8s standard concept) field into the KReference. We might need to special case the validation of KReference for resources whose controllers do this lookup.

Agree, making sure the user expectation is correct is definitely a good point. I like the idea of having a specific field for it.

I suggest working out the technical details and UX in an experimental feature of an alternate broker implementation rather than eventing core itself.

This is not possible without modifying the core apis, because brokers/channel implementations relies on core apis anyway. In some cases, vendors relies even on core controllers, like the subscription controller. Even if we move it forward as an experimental feature, we need to modify the core anyway.

So, given these, and if for example you use a standard way to parse these (like this one below) this would then mean that we'd have to special case around the special case due to k8s 'core' special handling.

Yep in my PR I exactly copied that logic and adapted it :smile:.

Proposal

Ok given this feedback, I propose one of the two following course of action:

OR:

Personally I prefer the first approach, less cloned code and, even if the type is in pkg, it shouldn't create problems to the rest of knative, because without modifying the CRD schemas, the field is unusable anyway, so stable APIs won't be affected. Where we want to test it (we can start from subscriptions and triggers first), we just use the experimental feature field validation to enable it and we modify the CRD schema to preserve unknown fields

n3wscott commented 3 years ago

Re: APIGroup please call it Group, this will match with the usage in CRDs.

slinkydeveloper commented 3 years ago

Yet another issue with api version here: https://github.com/knative-sandbox/eventing-kafka/issues/624

slinkydeveloper commented 3 years ago

I refactored the issue as experimental feature, including an implementation plan. Unless there are any particular objections, I'm going to move forward with the initial implementation and the inclusion process (check the issue description for more details).

slinkydeveloper commented 3 years ago

The implementation PRs for the inclusion are ready to review:

I'm working on the doc PR right now.

slinkydeveloper commented 3 years ago

And here is the doc PR: https://github.com/knative/docs/pull/3713

vaikas commented 3 years ago

The one thing that's not in the 'todo list' is that now every component that potentially does the resolving will need their RBAC rules updated to include listing/watching of the CRDs so as not to always hit the api server. At least I think that will be necessary, but maybe I'm wrong.

slinkydeveloper commented 3 years ago

The one thing that's not in the 'todo list' is that now every component that potentially does the resolving will need their RBAC rules updated to include listing/watching of the CRDs so as not to always hit the api server. At least I think that will be necessary, but maybe I'm wrong.

Yep, but it sounds like we already require it for other reasons https://github.com/knative/eventing/blame/main/config/core/roles/controller-clusterroles.yaml#L129, so no change there. Is that what you are referring to?

But yes in general, every controller executing KReference.ResolveGroup will require the get crd RBAC

vaikas commented 3 years ago

Right, for Subscription not necessary, but once this rolls out, say for example something like Sources will need this if it's injected. Not sure if it's only necessary if we flip the flag or if it gets injected and will prevent the controller from starting up. Just jotted it down as I was thinking about it :) We'll see.

slinkydeveloper commented 3 years ago

Ah I see what you mean... I'm not aware of any "dynamic" way for the controller to request RBAC at runtime (e.g. based on whether that flag is flipped or not), so the only viable solution I know is to request the RBAC for CRD anyway... Do you think this may be a problem?

vaikas commented 3 years ago

I'm not sure it's necessarily a problem, but what I was wondering is that once there's something in pkg that uses the CRD lister, then any code that depends on it might need to be updated with those permissions. Again, I'm not sure if that's the case, but just jotted it down. Should be easy enough to test on something that doesn't have that permission, bring your branch in and see if anything breaks? :) 🤞 that it just works.

If that is required, then we just need to do that planning work ahead and explain why those permissions are necessary. I think for the medium term (I think probably the right approach longer term is the Discovery?) which then would require some different permissions anyways :)

slinkydeveloper commented 3 years ago

but what I was wondering is that once there's something in pkg that uses the CRD lister, then any code that depends on it might need to be updated with those permissions.

I see what you mean, that shouldn't be a problem because, unless you use Kreference.ResolveGroup() directly, everything behaves as usual and validation won't need such permissions, so nothing should be touched by that.

Should be easy enough to test on something that doesn't have that permission, bring your branch in and see if anything breaks? :) crossed_fingers that it just works.

I can give it a spin with eventing-kafka-broker, which uses that kreference code (importing trigger spec) but doesn't require CRD RBAC

slinkydeveloper commented 3 years ago

@vaikas all E2E tests work properly here, so I think we're good: https://prow.knative.dev/view/gs/knative-prow/pr-logs/pull/knative-sandbox_eventing-kafka-broker/965/pull-knative-sandbox-eventing-kafka-broker-integration-tests/1400799445372637184

vaikas commented 3 years ago

Ok cool! Thanks for checking :)

slinkydeveloper commented 3 years ago

The feature it's been approved and merged for the inclusion!

antoineco commented 2 years ago

/unassign slinkydeveloper

github-actions[bot] commented 2 years ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.