kubernetes-sigs / gateway-api

Repository for the next iteration of composite service (e.g. Ingress) and load balancing APIs.
https://gateway-api.sigs.k8s.io
Apache License 2.0
1.86k stars 482 forks source link

Static/Unmanaged Gateway Mode (vs Provisioning) #1687

Closed shaneutt closed 1 month ago

shaneutt commented 1 year ago

(follow-up from GAMMA meeting 2023-01-31)

In the past we've had discussions about a mode of operation for Gateway API which we'll call "static mode" (previously called "unmanaged mode" in some old issues) wherein a Gateway resource refers to a gateway that has been provisioned by some other means (e.g. via Helm) but the implementation may or may not actually manage the lifecycle of that gateway.

Multiple implementations now have options for such a mode which can be particularly helpful for allowing legacy/pre-operator deployments to have Gateway API functionality. Implementations that do this include (but are not limited to):

Today there are prospective implementations of GAMMA (such as LinkerD) which are in need of such a mode, but currently that mode is a colloquialism, and not properly documented or accounted for here in upstream Gateway API.

TODO:

Prior art: #892, #925

howardjohn commented 1 year ago

Here is how Istio does it: https://istio.io/latest/docs/tasks/traffic-management/ingress/gateway-api/#manual-deployment

The idea was to match how cloud LB do it, where you point to an existing LB by address

shaneutt commented 1 year ago

Thanks @howardjohn!

cc @kflynn

sunjayBhatia commented 1 year ago

similarly here for Contour: https://projectcontour.io/docs/v1.23.2/guides/gateway-api/#option-1-statically-provisioned

mlavacca commented 1 year ago

This is how Kong deals with it: https://docs.konghq.com/kubernetes-ingress-controller/latest/concepts/gateway-api/#binding-kong-gateway-to-a-gateway-resource

Instead of relying on static/unmanaged gateways, we use static/unmanaged gatewayClasses, i.e., a Gateway is considered unmanaged if it uses an unmanaged GatewayClass.

kate-osborn commented 1 year ago

NGINX only supports "static" mode. We support one Gateway resource that is provisioned and managed by the user. We'd love to be able to run the conformance tests against our implementation via a static mode option or something similar.

kflynn commented 1 year ago

I've run into this with both Emissary-ingress and Linkerd. The Emissary-ingress case is simplest to describe, so I'll do that first: like NGINX as described by @kate-osborn, Emissary only supports "static" mode. It would be lovely to be able to have Emissary pass conformance, and considerably less lovely to have to completely change how Emissary works to do so.

The Linkerd case is one I saw firsthand when getting Envoy Gateway and Linkerd to play nicely together, so let me use that experience as an illustration. Please note that I'm not trying to argue that either Envoy Gateway or Linkerd is "wrong" here -- rather, I'm illustrating a point of friction.

The important things to know about Linkerd here are that:

Using an ingress installed using "static mode" with Linkerd is really easy: you install the ingress using your favorite installation method, make sure its Pods get the linkerd.io/inject annotation, and off you go. How exactly you apply the annotation depends on the installation mechanism, but they're all basically the same in the end.

If your ingress doesn't support "static mode", it's harder. In Envoy Gateway's case, it has Envoy proxy Pods that need the annotation, but that get recreated any time you touch the Gateway resource. Since you don't have control over the life cycle of those Pods, and you also don't have access to the Envoy Gateway Deployment, you have to annotate the envoy-gateway-system namespace to get Envoy Gateway into the mesh. Functionally, this is fine. Operationally, it's extra friction for this particular ingress: it would be simpler for many end users to give them the ability to just install Envoy Gateway statically.

howardjohn commented 1 year ago

FWIW Istio forwards the annotations from Gateway to the generated resources for similar reasons

kate-osborn commented 1 year ago

@shaneutt based on the discussions, it seems like there's a fair amount of interest in this work. How can we move this issue forward? What are the next steps?

Nginx is eager to be able to run the conformance tests against its implementation and is willing to volunteer for this work.

shaneutt commented 1 year ago

@shaneutt based on the discussions, it seems like there's a fair amount of interest in this work. How can we move this issue forward? What are the next steps?

Nginx is eager to be able to run the conformance tests against its implementation and is willing to volunteer for this work.

At this point we consider this accepted and have marked it on our Road to GA project board as something we'd like done for GA. Currently it needs a champion, someone to drive it forward and get the content (mostly it would seem, documentation) submitted. Whoever wants to do that should check out https://github.com/kubernetes-sigs/gateway-api/pull/1757 and probably touch base with @howardjohn as well, as the GEP he started there relates.

shaneutt commented 1 year ago

/help

k8s-ci-robot commented 1 year ago

@shaneutt: This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to [this](https://github.com/kubernetes-sigs/gateway-api/issues/1687): >/help Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
pleshakov commented 1 year ago

What do people think about the following approach to support the static Gateway mode in the conformance tests?

In the static mode, a Gateway implementation can only support one Gateway, so the following proposal updates the conformance tests to support that mode while preserving the support for the mode with multiple Gateways:

  1. Reduce the number of Gateways deployed by default to one:

    1. Do not deploy all-namespaces and backend-namespaces.
    2. Deploy the existing same-namespace.

    Note: Most of the existing tests either deploy their own Gateways or use the same-namespace Gateway.

  2. In all tests that deploy dedicated Gateways, where possible, collapse their listeners into a single Gateway. Those tests will need remove the same-namespace Gateway, deploy their test-specific Gateway and then restore the same-namespace Gateway upon completion.
  3. For tests that cannot be updated to use a single Gateway, group them as a feature DynamicGateways, so that their can be easily disabled.
  4. All tests that use only same-namespace Gateway will continue to use it and will not need any modification.
  5. If necessary, add a new test(s) that exercises support for multiple Gateways.

Below is the list of existing tests ( based on commit 6efcfdf8aff7aa7c3ed8deb956e5b93eed779234 ) along with used Gateways:

Test # of Gateways
GatewayInvalidRouteKind 2 (dedicated)
GatewayInvalidTLSConfiguration 4 (dedicated)
GatewayModifyListeners 2 (dedicated)
GatewayObservedGenerationBump 1 (dedicated)
GatewaySecretInvalidReferenceGrant 1 (dedicated)
GatewaySecretMissingReferenceGrant 1 (dedicated)
GatewaySecretReferenceGrantAllInNamespace 1 (dedicated)
GatewaySecretReferenceGrantSpecific 1 (dedicated)
GatewayWithAttachedRoutes 2 (dedicated)
GatewayClassObservedGenerationBump 0
HTTPRouteCrossNamespace 1 (Relies on the base Gateway backend-namespaces)
HTTPExactPathMatching, HTTPRouteHeaderMatching, HTTPRouteInvalidNonExistentBackendRef, HTTPRouteInvalidBackendRefUnknownKind, HTTPRouteInvalidCrossNamespaceBackendRef, HTTPRouteInvalidCrossNamespaceParentRef, HTTPRouteInvalidParentRefNotMatchingListenerPort, HTTPRouteInvalidParentRefNotMatchingSectionName, HTTPRouteMatching, HTTPRouteMatchingAcrossRoutes, HTTPRouteMethodMatching, HTTPRouteObservedGenerationBump, HTTPRoutePartiallyInvalidViaInvalidReferenceGrant, HTTPRoutePathMatchOrder, HTTPRouteQueryParamMatching, HTTPRouteRedirectHostAndStatus, HTTPRouteRedirectPath, HTTPRouteRedirectPort, HTTPRouteRedirectScheme, HTTPRouteReferenceGrant, HTTPRouteRequestHeaderModifier, HTTPRouteResponseHeaderModifier, HTTPRouteRewriteHost, HTTPRouteRewritePath, HTTPRouteSimpleSameNamespace 1 (Relies on the base Gateway same-namespace)
HTTPRouteDisallowedKind 1 (dedicated)
HTTPRouteHostnameIntersection 1 (dedicated)
HTTPRouteListenerHostnameMatching 1 (dedicated)
TLSRouteSimpleSameNamespace 1 (dedicated)
shaneutt commented 1 year ago

@mlavacca curious as to your thoughts on this, considering our current static implementation?

mlavacca commented 1 year ago
  1. In all tests that deploy dedicated Gateways, where possible, collapse their listeners into a single Gateway. Those tests will need remove the same-namespace Gateway, deploy their test-specific Gateway and then restore the same-namespace Gateway upon completion.

I'm concerned about this approach, as what happens when tests are run in parallel? There is a high probability of tests failing because the needed gateway has been deleted by another test in execution.

I may lean toward having a separate test suite for static tests, with no gateways deployed beforehand and each test deploying its specific gateway. It is for sure a more verbose solution with some duplicated code (we could try to have shared test bodies), but having this level of refactoring of the whole conformance test suite looks massive.

mlavacca commented 1 year ago

@pleshakov I'd like to share that Kong Ingress Controller supports multiple gateways in static mode (without dataplane provisioning) by merging them all in a single dataplane configuration. In this way, we are able to run the conformance test suite, and there is no need for a static mode configuration. Maybe this solution could also fit the Nginx needs.

youngnick commented 1 year ago

As I mentioned in the meeting yesterday, I think there are a couple of other relevant concerns here:

pleshakov commented 1 year ago

@mlavacca

I'm concerned about this approach, as what happens when tests are run in parallel? There is a high probability of tests failing because the needed gateway has been deleted by another test in execution.

This makes sense. I haven't considered parallel execution.

I'd like to share that Kong Ingress Controller supports multiple gateways in static mode (without dataplane provisioning) by merging them all in a single dataplane configuration. In this way, we are able to run the conformance test suite, and there is no need for a static mode configuration. Maybe this solution could also fit the Nginx needs.

Thanks for sharing this approach. I think it can work (considering also that it works for you succesfully), however, it seems that if any new test starts creating conficting rules -- for example, different protocols on the same ports in different Gateways or same routing rules for the same ports on different Gateways routing to different destinations -- the approach will not work for them.

pleshakov commented 1 year ago

I'd like to propose another approach that can enable the static mode in the conformance tests. I wonder what people think.

Compared to the previous one above, it doesn't require updating either the tests code or test manifests, only the common tests code. The main idea is for every new Gateway, use a different GatewayClass.

Below is how it works:

This is a POC (just to show it can work) -- https://github.com/pleshakov/gateway-api/commit/198877b558014edcbbc990b5ff8dc5ab49b3b0fc

Note: For dynamic Gateways, nothing changes -- the tests will continue to use a single GatewayClass as they do now.

pleshakov commented 1 year ago

The proposed approach above was discussed during the community meeting on May 8, and the group suggested not to go with that.

The suggested way forward (hope I got it right):

cc @youngnick

shaneutt commented 1 year ago

We still want this, but it doesn't need to be done for v1.0.0 we can do this after the GA release.

k8s-triage-robot commented 10 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 9 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 8 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 8 months ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/gateway-api/issues/1687#issuecomment-2016702301): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
mikemorris commented 6 months ago

/reopen

I'd like to reopen this, as it seems several implementations have a need for this functionality - we all have slightly different ways of enabling this currently (and different intersecting features), but it feels like a common solution should be possible, and this feature would be complementary to (but not dependent on) the "gateway merging" standardization happening in https://github.com/kubernetes-sigs/gateway-api/pull/1863.

We'd like to solve this within the spec in the interest of being able to report full conformance for Azure's Application Gateway for Containers implementation, and can commit some time to putting together a proposal and implementation - we'd be interested in collaborating with anyone similarly motivated to standardize this functionality!

Two approaches we're currently considering are:

I'd prefer to stay away from using spec.addresses as the sole means of attaching to the static infrastructure because of its utility in specifying static or dynamic IP addresses.

I've tried to pull together an overview of the different implementations discussed in this thread so far - please feel free to correct any mistakes or omissions!

AKS

Cluster Locality: External Gateway Merging: πŸ—ΊοΈ (on roadmap) Dynamic Infrastructure Provisioning: ❌ Mechanism: Binds to an existing AGC load balancer directly by specifying in an annotation the namespace and name of an ApplicationLoadBalancer custom resource (when managed from the same cluster), or by specifying in an annotation the resource ID of an AGC load balancer (if unmanaged, or managed by a controller in a different cluster). Can either bind to an existing reserved static IP in spec.addresses, or dynamically provision a static IP on the static infrastructure by leaving spec.addresses empty.

GKE

Cluster Locality: External Gateway Merging: βœ… Dynamic Infrastructure Provisioning: βœ… Mechanism: Binds to an existing GKE gateway indirectly by specifying in spec.addresses an external static IP attached to the GKE gateway. Can dynamically provision infrastructure by leaving spec.addresses empty.

Istio

Cluster Locality: Internal Gateway Merging: βœ… Dynamic Infrastructure Provisioning: βœ… Mechanism: Binds to an existing Istio ingress gateway directly by specifying the ingress hostname in spec.addresses. Can dynamically provision infrastructure by leaving spec.addresses empty.

Linkerd

Cluster Locality: Internal Gateway Merging: Ingress-dependent? Dynamic Infrastructure Provisioning: βœ… Mechanism: Binds to an existing ingress gateway by adding an annotation to pods to add them into the mesh. More difficult if pods are created dynamically.

Contour

Cluster Locality: Internal Gateway Merging: ? Dynamic Infrastructure Provisioning: βœ… Mechanism: Binds by specifying spec.controllerName in GatewayClass - target name can be customized in static infrastructure config.

Kong

Cluster Locality: Internal Gateway Merging: βœ… Dynamic Infrastructure Provisioning: βœ… Mechanism: Binds by specifying spec.controllerName in GatewayClass - target name can be customized in static infrastructure config. Requires setting a konghq.com/gatewayclass-unmanaged=true annotation to attach to static infrastructure. Dynamic provisioning requires deploying an operator and creating a GatewayClass with the konghq.com/gateway-operator controllerName.

NGINX

Cluster Locality: Internal? Gateway Merging: ❌ Dynamic Infrastructure Provisioning: ❌ Mechanism: ?

Emissary-ingress

Cluster Locality: Internal Gateway Merging: ? Dynamic Infrastructure Provisioning: ❌ Mechanism: ?

/cc @snehachhabria @robscott @howardjohn @kflynn @sunjayBhatia @mlavacca @shaneutt @kate-osborn

k8s-ci-robot commented 6 months ago

@mikemorris: Reopened this issue.

In response to [this](https://github.com/kubernetes-sigs/gateway-api/issues/1687#issuecomment-2105405021): >/reopen > >I'd like to reopen this, as it seems several implementations have a need for this functionality - we all have slightly different ways of enabling this currently (and different intersecting features), but it feels like a common solution should be possible, and this feature would be complementary to (but not dependent on) the "gateway merging" standardization happening in https://github.com/kubernetes-sigs/gateway-api/pull/1863. > >We'd like to solve this within the spec in the interest of [being able to report full conformance for Azure's Application Gateway for Containers implementation](https://github.com/kubernetes-sigs/gateway-api/pull/3021#issuecomment-2079950902), and can commit some time to putting together a proposal and implementation - we'd be interested in collaborating with anyone similarly motivated to standardize this functionality! > >I've tried to pull together an overview of the different implementations discussed in this thread so far: > >### AKS >**Cluster Locality:** External >**Gateway Merging:** πŸ—ΊοΈ (on roadmap) >**Dynamic Infrastructure Provisioning:** ❌ >**Mechanism:** Binds to an existing AGC load balancer _directly_ by specifying in an annotation the namespace and name of an ApplicationLoadBalancer custom resource (when managed from the same cluster), or by specifying in an annotation the resource ID of an AGC load balancer (if unmanaged, or managed by a controller in a different cluster). Can either bind to an existing reserved static IP in `spec.addresses`, or dynamically provision a static IP on the static infrastructure by leaving `spec.addresses` empty. > >### GKE >**Cluster Locality:** External >**Gateway Merging:** βœ… >**Dynamic Infrastructure Provisioning:** βœ… >**Mechanism:** Binds to an existing GKE gateway _indirectly_ by specifying in `spec.addresses` an _external_ static IP attached to the GKE gateway. Can dynamically provision infrastructure by leaving `spec.addresses` empty. > >### Istio >**Cluster Locality:** Internal >**Gateway Merging:** βœ… >**Dynamic Infrastructure Provisioning:** βœ… >**Mechanism:** Binds to an existing Istio ingress gateway _directly_ by specifying the ingress hostname in `spec.addresses`. Can dynamically provision infrastructure by leaving `spec.addresses` empty. > >### Linkerd >**Cluster Locality:** Internal >**Gateway Merging:** Ingress-dependent? >**Dynamic Infrastructure Provisioning:** βœ… >**Mechanism:** Binds to an existing ingress gateway by adding an annotation to pods to add them into the mesh. More difficult if pods are created dynamically. > >### Contour >**Cluster Locality:** Internal >**Gateway Merging:** ? >**Dynamic Infrastructure Provisioning:** βœ… >**Mechanism:** Binds by specifying `spec.controllerName` in GatewayClass - target name can be customized in static infrastructure config. > >### Kong >**Cluster Locality:** Internal >**Gateway Merging:** βœ… >**Dynamic Infrastructure Provisioning:** ? >**Mechanism:** Binds by specifying `spec.controllerName` in GatewayClass - target name can be customized in static infrastructure config. Requires setting a `konghq.com/gatewayclass-unmanaged=true` annotation to attach to static infrastructure. > >### NGINX >**Cluster Locality:** Internal? >**Gateway Merging:** ❌ >**Dynamic Infrastructure Provisioning:** ❌ >**Mechanism:** ? > >### Emissary-ingress >**Cluster Locality:** Internal >**Gateway Merging:** Ingress-dependent? >**Dynamic Infrastructure Provisioning:** ❌ >**Mechanism:** ? > >/cc @snehachhabria @robscott @howardjohn @kflynn @sunjayBhatia @mlavacca @shaneutt @kate-osborn Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
shaneutt commented 6 months ago

@mikemorris you can mark Kong as :heavy_check_mark: for Dynamic Infrastructure Provisioning. We have "unmanaged" mode if you're using our historical ingress controller directly (see https://github.com/kong/kubernetes-ingress-controller), or provisioned/managed mode if you use our operator (see https://github.com/kong/gateway-operator).

shaneutt commented 6 months ago

Appreciate your thoughtful update and desire to see this one move forward also, let's un-rotten it:

/remove-lifecycle rotten

k8s-triage-robot commented 3 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 1 month ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/gateway-api/issues/1687#issuecomment-2438949902): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.