knative / serving

Kubernetes-based, scale-to-zero, request-driven compute
https://knative.dev/docs/serving/
Apache License 2.0
5.55k stars 1.16k forks source link

SinkBinding doesn't work with KServices that uses BYO revision names #9544

Open rhuss opened 4 years ago

rhuss commented 4 years ago

Describe the bug

When using a SinkBinding with a Knative Service as subject that uses BYO revision names (i.e. setting the revision name from the client), then no K_SINK environment variable is injected to the KService.

Expected behavior

K_SINK should be injected to the Pod Template as a container env var regardless whether a Knative Service is configured for BYO revision names or not.

To Reproduce

kn service create random --image rhuss/random:1.0
kn broker create default
kn source binding create my-sb --subject Service:serving.knative.dev/v1:random --sink broker:default
kn service describe random --verbose | grep K_SINK

This will get no result as kn used BYO revision names by default. If switching over to server side generated names, then a K_SINK env var is injected.

# Switch over to server side generated names
kn service update random --revision-name ""
kn service describe random --verbose | grep K_SINK

        Env:    K_SINK=http://broker-ingress.knative-eventing.svc.cluster.local/default/default

Knative release version

Eventing 0.17.3 (running on minikube)

rhuss commented 4 years ago

See also discussion at https://knative.slack.com/archives/C9JP909F0/p1600876741003800

evankanderson commented 4 years ago

This is probably a serving, rather than eventing, issue -- it would apply equally to a hypothetical OsbBinding which set the VCAP_SERVICES environment variable on a Knative Service which was in the "BYO revision" mode.

I wonder if the best solution would be to clear spec.template.metadata.name on the resource before applying the patch, or adding a default to clear that field into the patch if the patch doesn't explicitly manage that field.

@mattmoor @dprotaso for thoughts.

mattmoor commented 3 years ago

Ultimately, different PodSpecables present different restrictions. For some cases they can handle it gracefully, and for others they should probably fail gracefully.

I'd characterize BYO name as one of our niche restrictions.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

rhuss commented 3 years ago

I'd characterize BYO name as one of our niche restrictions.

Until 0.20, BYO name was the default mode for kn (don't remember why this was the case), we changed that now to server-side naming in 0.21. However, since we still support the older kn version and it is still easy to switch to BYO revision names with kn, maybe this should be supported by the binding reconciliation loop? E.g. one could easily detect BYO mode and, if enabled, just set the revision name to a new random name.

As an alternative, I would suggest deprecated BYO revision name and let it fade out. I really don't see any convincing argument for BYO names, especially if it can not be ruled out, that only a single client is used for manipulating a KService.

I'm reopening this issue, just to track this open question.

cardil commented 3 years ago

I hit that one with kn 0.19 and serving 0.21. Maybe do reopen as you said @rhuss ?

/reopen /remove-lifecycle stale

knative-prow-robot commented 3 years ago

@cardil: Reopened this issue.

In response to [this](https://github.com/knative/serving/issues/9544#issuecomment-805911504): >I hit that one with kn 0.19 and serving 0.21. Maybe do reopen as you said @rhuss ? > >/reopen >/remove-lifecycle stale Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
markusthoemmes commented 3 years ago

Is there anything left to be done here? The above sounds like this is working as designed and we'd at most document the behavior better and/or improve visibility of the issue?

rhuss commented 3 years ago

Well, I would say it depends: If BYO revisions is a fully supported feature by Knative Serving (even when it is a "niche feature") I would expect it to either work with SinkBindings or would have a documentation that BYO revision names are not supported for SinkBindings.

An alternative would be to remove BYO revision names as I think this feature is not very well tested, too ? (see this bug :)

github-actions[bot] commented 3 years ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

cardil commented 3 years ago

/remove-lifecycle stale

markusthoemmes commented 3 years ago

My understanding of SinkBinding is, that it also carries a reconciler which could really change the Knative Service at any point, kind of breaking the immutability requirement of revisions if we were to somehow bypass the name check that lead to the initial report here.

IMO we have 2 options then:

  1. Keep things the way they are and clearly document that SinkBinding is not supported for BYO revision names.
  2. Drop BYO revision names altogether.
markusthoemmes commented 3 years ago

/area api

rhuss commented 3 years ago

+1 for dropping BYO revision names as they have more problems than they solve (and the client these days default to server-generated revision names anyway and don't leverage client-side revision names). See also the release notes of https://github.com/knative/client/releases/tag/v0.21.0 for some more arguments against BYO revision names.

duglin commented 3 years ago

I don't use BYO revision names much, but that's probably because I don't play with traffic splitting very often - beyond trivial demos. However, if we drop BYO support will that make it harder for people to setup and manage their traffic splitting? In particular, I often find it really annoying to work with systems that require me to create something, discover what I just created, and then use that info in a follow-on command. Not only because it should have let me pick a user-friendly name to begin with so I can reference it via scripts easily, but it also means that I have to (potentially) have my system in an undesirable state until I can execute that last step of that process. And if I have to jump through some hoops I normally would not have jumped through just to make things work, then that's not the best UX. So... what's the UX implications here if we drop BYO revision names for non-trivial traffic splitting?

rhuss commented 3 years ago

Actually, BYO as used by kn does not help you in this situation at all as the client chooses a random name for the revision on every update (which is even harder to predict). BYO names have tons of drawbacks, some of them are listed in these release notes https://github.com/knative/client/releases/tag/v0.21.0 . Another disadvantage is that you really depend on the state of your cluster when you e.g. choose a name that must not conflict with any other existing revision's name. So a script might work on one cluster but not on the other, depending on the cluster's history of revisions.

Said that there is good news as we exactly tackle this UX problem with the latest proposal for an updated UI for traffic split as described in this feature track

This proposal will allow you to create traffic splits without looking up revisions names for the most common use cases.

rhuss commented 3 years ago

I fail to see any advantage for BYO revision names, even for traffic splits it is not practical (as explained above).

cardil commented 3 years ago

I see users might like to use BYO revision name to equal their application version. In that way, they will know what they actually traffic split. It's easy for us, to use default revision numbered version (or even random one from before) as we just do examples and simple demos.

Imagine yourself being in some company, and releasing and deploying new SSO service, that other services and users are constantly using. Your service version is changing from 2.3.3-1 to 2.4.0-2, so you're bringing new features, but that involved numerous dependencies needed to bump. Despite testing this on testing and stage environments, you are not 100% sure it will work, as experience taught you that production data and load can cause many unforeseen problems. That's why your team decided to use Knative to traffic split every new release of such crucial services (any in fact :wink:).

So, you have your images: registry.acme.com/edge/sso:2.3.3-1 and registry.acme.com/edge/sso:2.4.0-2. Now, you are running 2.3.3-1 version, and you have 100% of traffic routed to revision named 2.3.3-1. You start the deployment of 2.4.0-2 version, and in the same kubectl apply you are setting 5% of load to hit new version. :crossed_fingers:

Of course, to do that, you needed to always use BYO revision and name it after your build version. That's tedious.

The traffic split FT @rhuss do not solve this, as it only is a CLI feature. In fact, it may make it harder (the --traffic 100% and --traffic 0%) to know for sure, what am I doing, actually. Imagine someone gave you a new terminal session to production k8s. Will you invoke --traffic 100% and --traffic 0% options without first checking what actually the state is?!?

I image that a company/team that takes traffic splitting seriously would invest in some wrapper that will perform deploys automatically. 5%, check logs for X min, 10%, check logs, 20%, check logs, ... Such thing could be written in anything, terraform code, an operator, or a web app. It's unrealistic to assume a human operator would use kn to deploy production services, and do the traffic splitting, log crunching loop.

We could do a better job in generating revision names, it would save some confusion. We could:

  1. Look at well-known image labels to read the revision name, or version, build number
  2. As a fall back, look at image tag, and use it as version number
  3. If none of the above are successful, use current ordered number revision names
  4. Use dynamic naming for revision name, like current, previous, n-2 in traffic splitting - both on CLI and YAML
  5. Allow users to add aliases to revisions, to be used in traffic splitting, apart from dynamic ones.
markusthoemmes commented 3 years ago

We could do a better job in generating revision names, it would save some confusion. We could:

It wouldn't help though. The problem in this case is that we're trying to update an immutable revision after it being created. If we adopt the naming strategy you suggest, the same issue would be around as the name of the revision with the same label (or image tag or whatever) would stay the same and thus collide in the same ways.

I agree with the description on where BYO names are useful tho.

cardil commented 3 years ago

One additional idea is that system components could have more permission than user does - meaning we might decide that SB can change a thing that is immutable for user.

Or maybe introduce a mutant revision, and just shift the revision name to it - the underlying revision names could be sequential, and we just apply a proper revision name via label.

rhuss commented 3 years ago

IMO you can't use an image version/tag as a revision name reliably as every Configuration change would need a new revision name, even when you don't change the image (but maybe only a label or env var). A revision name != an application version.

That has to be provided by the user. IMO it is far better to leverage the tag: parts of traffic split to give you a more indicative name. I still don't think it is practical for the client to provide the revision names without evaluating the cluster state before (e.g. if this revision name already exists before using it). This only works if you have full control and don't try any apply kind of operations. In any practical use case that I have encountered for client-side naming, there was already a random portion to somewhat reduce the risk of a naming collision (but still can happen). There is no benefit of server-side naming generation, that can guarantee to create a new unique revision name.

btw, with the CLI proposal mentioned above btw, you are still able to fully specify the traffic split without the help of a CLI to 'fill up to 100%' support.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

rhuss commented 2 years ago

/remove-lifecycle stale

While I think it might not be necessary to fix SinkBindings to run with BYO revision names, I also think we should document the restriction somewhere, ideally at the place where the SinkBinding itself is documented.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

abrennan89 commented 2 years ago

@rhuss if this is still relevant can you open a docs bug to add a note about this please?

cardil commented 2 years ago

/remove-lifecycle stale /reopen

@abrennan89 Couldn't we just reuse this issue as doc one?

knative-prow[bot] commented 2 years ago

@cardil: Reopened this issue.

In response to [this](https://github.com/knative/serving/issues/9544#issuecomment-1268209263): >/remove-lifecycle stale >/reopen > >@abrennan89 Couldn't we just reuse this issue as doc one? Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
abrennan89 commented 2 years ago

@cardil if someone for serving is writing the docs then yeah that's fine, otherwise please close it and open an issue in the docs repo that provides specific details about what needs to change.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

knative-prow-robot commented 1 year ago

This issue or pull request is stale because it has been open for 90 days with no activity.

This bot triages issues and PRs according to the following rules:

You can:

/lifecycle stale

rhuss commented 1 year ago

/remove-lifecycle stale