knative / serving

Kubernetes-based, scale-to-zero, request-driven compute
https://knative.dev/docs/serving/
Apache License 2.0
5.46k stars 1.14k forks source link

Feature Flag Routing #4736

Open duglin opened 4 years ago

duglin commented 4 years ago

In what area(s)?

/area API /area networking

/kind spec /kind proposal

Describe the feature

One of the things we hear from customers is that routing between revisions based on percentage isn't really of interest to them once in production. In practice they tend to lean more towards a model where only a select group of users are expected to use the latest versions so that everyone else (who need more stability) are not impacted during the testing phase. We call that "Feature Flag Routing". Where the routers (e.g. istio) route requests to certain versions based on some metadata in the message (e.g. perhaps some http header) rather than the random % based stuff. We'd like for Knative to eventually support this as an option. It's not clear if this can be done solely under the covers or whether it might result in an API change. If an API change might be needed it might be best to at least design it out prior to v1.0 when the API is locked down.

Some of the uses for the "select group":

mattmoor commented 4 years ago

If an API change might be needed it might be best to at least design it out prior to v1.0 when the API is locked down.

A nit, this is only true if a breaking API change is needed. The core kube APIs (e.g. Pod) have continued to evolve long past v1 to incorporate new (non-trivial) functionality. An example of this is projection of secrets and configmaps into environment variables, which didn't land until 1.3 (IIRC), and this is far from the last change.

Do you have a reason to think that this would require a breaking API change?

route requests to certain versions based on some metadata in the message (e.g. perhaps some http header)

I am keen to add a new abstraction that can do these kinds of things as well:

I've always seen this as a higher-level abstraction (atop Service) because often the things you are switching over are fairly different. e.g.

My feeling is that this should target Addressable things, which includes Service and Route, but opens the possibility for this to target other abstractions implementing Addressable. I also think it itself should be Addressable so it can be composed with consumers of Addressable (e.g. Eventing).

We should be careful about folding everything into a single abstraction as Istio's VirtualService has, as it can become incredibly hard to use properly. This is (part of) why my bias is towards much more focused and clear abstractions that can be composed.


Another notable challenge this introduces is that it will require an expansion of our networking interface. Istio's API can clearly handle it, but I'm unsure of Gloo (or others I have heard are in the works).

duglin commented 4 years ago

Do you have a reason to think that this would require a breaking API change?

Off the top of my head I see one potential area... the percent field. When not present it defaults to zero, so adding a new field to specify some other routing thingy might feel like a bit of a conflict. So, we may need to modify it to say "w/o any routing data at all, default is percent=0", instead of "if no percent, default is percent=0".

I agree with you that I think this is mostly a higher-level abstraction thing, and mainly something that can be additive after v1.0 if we don't make the timeline. But, I wanted to bring it up now for 2 reasons: 1) get a sense for interest among the community 2) make sure there's nothing in v1.0 that might cause an API conflict/breaking change to support this.

And so far, it's just that percent thing that's worrying me - which might not be too hard to address.

But I'd like to hear if others know of spots to think about....

markusthoemmes commented 4 years ago

I do agree with @mattmoore‘s point that this can be folded into a higher level abstraction.

My understanding of the need for percentage based traffic splitting is that we need to have a mechnism for unobstrusive rollouts. Percentage based traffic splitting allows us to do what deployment does with its rolling upgrade but also allows us to have immutable deployments (the rolling is a result of mutating the deployment).

Other means are undoubtedly super useful for various use-cases but maybe should not be part of the core types to keep them somewhat focused on what they need to be and to not further push requirements into the networking layer as Matt mentioned.

knative-housekeeping-robot commented 4 years ago

Issues go stale after 90 days of inactivity. Mark the issue as fresh by adding the comment /remove-lifecycle stale. Stale issues rot after an additional 30 days of inactivity and eventually close. If this issue is safe to close now please do so by adding the comment /close.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/lifecycle stale

knative-housekeeping-robot commented 4 years ago

Stale issues rot after 30 days of inactivity. Mark the issue as fresh by adding the comment /remove-lifecycle rotten. Rotten issues close after an additional 30 days of inactivity. If this issue is safe to close now please do so by adding the comment /close.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/lifecycle rotten

knative-housekeeping-robot commented 4 years ago

Rotten issues close after 30 days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle rotten.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/close

knative-prow-robot commented 4 years ago

@knative-housekeeping-robot: Closing this issue.

In response to [this](https://github.com/knative/serving/issues/4736#issuecomment-590832781): >Rotten issues close after 30 days of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh by adding the comment `/remove-lifecycle rotten`. > >Send feedback to [Knative Productivity Slack channel](https://knative.slack.com/messages/CCSNR4FCH) or file an issue in [knative/test-infra](https://github.com/knative/test-infra/issues/new). > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
duglin commented 4 years ago

@mattmoor (or anyone) can I get this reopened? I missed the lifecycle messages.

vagababov commented 4 years ago

/reopen

knative-prow-robot commented 4 years ago

@vagababov: Reopened this issue.

In response to [this](https://github.com/knative/serving/issues/4736#issuecomment-626255482): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
duglin commented 4 years ago

@vagababov thanks!

knative-housekeeping-robot commented 4 years ago

Rotten issues close after 30 days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle rotten.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/close

knative-prow-robot commented 4 years ago

@knative-housekeeping-robot: Closing this issue.

In response to [this](https://github.com/knative/serving/issues/4736#issuecomment-641243950): >Rotten issues close after 30 days of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh by adding the comment `/remove-lifecycle rotten`. > >Send feedback to [Knative Productivity Slack channel](https://knative.slack.com/messages/CCSNR4FCH) or file an issue in [knative/test-infra](https://github.com/knative/test-infra/issues/new). > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
duglin commented 4 years ago

/reopen

knative-prow-robot commented 4 years ago

@duglin: Reopened this issue.

In response to [this](https://github.com/knative/serving/issues/4736#issuecomment-641321949): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
knative-housekeeping-robot commented 3 years ago

Rotten issues close after 30 days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle rotten.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/close

knative-prow-robot commented 3 years ago

@knative-housekeeping-robot: Closing this issue.

In response to [this](https://github.com/knative/serving/issues/4736#issuecomment-656407847): >Rotten issues close after 30 days of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh by adding the comment `/remove-lifecycle rotten`. > >Send feedback to [Knative Productivity Slack channel](https://knative.slack.com/messages/CCSNR4FCH) or file an issue in [knative/test-infra](https://github.com/knative/test-infra/issues/new). > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
duglin commented 3 years ago

/reopen

knative-prow-robot commented 3 years ago

@duglin: Reopened this issue.

In response to [this](https://github.com/knative/serving/issues/4736#issuecomment-656436469): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
aliok commented 3 years ago

Some user interest in this feature: https://stackoverflow.com/questions/63615721/knative-routing-based-on-custom-headers/63618289

mattmoor commented 3 years ago

/lifecycle frozen

This was sent to our mailing list as well. I'm a little reluctant to creep ksvc too much vs. adding new resources that compose well with ksvc (and others) in the same way we're trying to do vanity domains.

I sketched some thoughts here over the weekend, and have a PoC that does (part of) this: https://docs.google.com/document/d/1Cp_h4MIRGt2Vy-EE0Yy5L5Y0wRNaMyp6SUq0TCi6XLM/edit

evankanderson commented 3 years ago

Other than the request for the user to choose the request header, it seems like Header-based Tag Routing should be able to do this?

/triage needs-user-input

evankanderson commented 3 years ago

Assuming that header-based tag routing will work here unless someone wants to chime in with a reason why it won't.

/close

knative-prow-robot commented 3 years ago

@evankanderson: Closing this issue.

In response to [this](https://github.com/knative/serving/issues/4736#issuecomment-867165579): >Assuming that header-based tag routing will work here unless someone wants to chime in with a reason why it won't. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
duglin commented 3 years ago

/reopen actually, allowing other headers is kind of important since we can't mandate that the user the inject Knative header into each message. This is why we may want to route based on some other header that's already flowing, like an auth header. Also, it would be nice to route to a non-tagged revision - similar to how % based routing doesn't need tagged either. Plus, if I'm remembering correctly, with tags you're required to expose those to the internet - you can't make them private but then have the main URL public. So, if I want to only allow certain users access to a revision I wouldn't want it visible via a tag.

knative-prow-robot commented 3 years ago

@duglin: Reopened this issue.

In response to [this](https://github.com/knative/serving/issues/4736#issuecomment-867271469): >/reopen >actually, allowing other headers is kind of important since we can't mandate that the user the inject Knative header into each message. This is why we may want to route based on some other header that's already flowing, like an auth header. Also, it would be nice to route to a non-tagged revision - similar to how % based routing doesn't need tagged either. Plus, if I'm remembering correctly, with tags you're required to expose those to the internet - you can't make them private but then have the main URL public. So, if I want to only allow certain users access to a revision I wouldn't want it visible via a tag. Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
mapleeit commented 2 years ago

Agree with @duglin It would help us a lot if supporting route traffic by custom headers, because in our case, we can't gurantee we can inject the Knative-Serving-Tag: rev1 header to every request. And it also expose the purpose of trying to treat somebody differently.

mkloveyy commented 11 months ago

Agree with @duglin, It would be helpful for us too.