Open dprotaso opened 1 year ago
Just to point out, while Knative is using and testing this functionality more aggressively than most implementations, every user of the Gateway API (including tools like https://flagger.app/) will be trying to perform these operations:
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
name: http-filter-1
spec:
hostnames:
- my.site.com
rules:
backendRefs:
- name: my-app-blue
port: 80
add a second backend:
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
name: http-filter-1
spec:
hostnames:
- my.site.com
rules:
backendRefs:
- name: my-app-green
port: 80
Note that Knative's networking tests perform this type of configuration change multiple times over a few-minutes test while maintaining at least one concurrent request in flight, which may be more sensitive than some monitoring which only probes once every few minutes.
I think (1) is valid, but (2) is not, IMO. This actually doesn't even work in Kubernetes core with just basic Service routing. The Endpoint being generated is the signal to kube-proxy to start serving it; just because its there doesn't mean its immediately available to consumers.
If your client detects the endpoint and sends traffic to the Service faster than kube-proxy is programmed you will see dropped traffic.
a Ready
condition at the HTTPRoute
level that the client can wait on until Ready=True
would solve 1. & 2.
For the case when the endpoint/backend is transitioning from Ready to Terminating, some form of health checks will be needed (passive/outlier detection+retries or active) to eliminate any traffic loss.
sidenote - @dprotaso @evankanderson are above tests defined programmatically, that implementations can run ?
(1) should be enforced without any status - changes to a route should be zero downtime or routes are effectively immutable, IMO.
I believe the update
test in https://github.com/knative-sandbox/net-gateway-api/tree/main#getting-started should work for that. Knative currently only tests Istio and Contour (and has some gaps for each!), but the code in question for the update test lives here:
https://github.com/knative/networking/blob/main/test/conformance/ingress/update.go
And yes, I'm especially concerned about failure mode 1 (as described in my earlier comment), because the only "workaround" is to shift load via DNS to a different cluster prior to making routing changes, which isn't really scalable.
For (2), does it seem reasonable to block the data-plane routing updates until the Service backends are resolved in the gateway api control plane? This would be more complex, as the control plane would need to define the new backend endpoints and wait for the data plane to complete a health check before updating the routing to point to the new backend.
For newly-created routes, a Ready
condition is helpful, but when we're talking about updating existing HTTPRoutes we need to smoothly transition from one valid state to a new valid state without going through a "drop some traffic" state.
For (2), does it seem reasonable to block the data-plane routing updates until the Service backends are resolved in the gateway api control plane? This would be more complex, as the control plane would need to define the new backend endpoints and wait for the data plane to complete a health check before updating the routing to point to the new backend.
In a vacuum, sort of. I think its hard to do, or at least it would be in Istio which is what i am familiar with, and at least harder than it sounds.
We don't want to block all updates, just to that specific route. So that means either the control or data plane needs to keep both copies. We also need to know when we are in the state of "waiting for backend", because we don't want to wait forever i'd imagine (or do we? even on a new route, do not apply it until all backends are healthy?). So we need to be fairly stateful. This is plausible, but may not robust - we aren't (realistically) going to become truly stateful, so if the control plane restarts during this operation we may suddenly flip to the unhealthy backend; perhaps thats an edge case we can live with.
To handle it, we also need a mechanism to know "is the backend ready" which is more robust than the k8s state and actually is based on the underlying data plane. That is possible to determine in general with Envoy, but becomes tricky in practice. Do we need all instances of the data plane to be ready, or only some of them? We may also have multiple control plane replicas, so now we need to share this state between control plane instances. What if we are a multi-cluster mesh, do we need all control planes to share with all other control planes and ensure 100% of all data plane instances across the globe have the change before we shift? Can a single envoy instances being down (in the global view) block all updates?
For triage:
/assign @arkodg
@dprotaso would a conformance test for 1) be a good next step ?
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
/lifecycle frozen
Ive raised https://github.com/kubernetes-sigs/gateway-api/issues/2303 to solve 1. which imo is a test, implementations should strive to achieve.
Programmed
status condition in the HTTPRoute because we cannot guarantee if the
route has been actually programmed in the data plane. Ready
implemented by the controller, could be used in theory to guarantee that the routes in the data plane are ready, but this is a hard problem to solve for most implementations.Instead of focusing of what information is needed by a client before starting to send requests, to guarantee no traffic drops, we can instead come up with a max duration within which a controller should program routes.
here is the example steps provided in the original description
1.Spin up a deployment and service 2.Wait for a healthy endpoint to appear (by polling the API server) 3.Update an HTTPRoute to point to the new backend. 4.Repeat 1-3 about 10x
A conformance test can be written to ensure that the tests waits for a duration of time between step 3. and 4. for some duration (lets say 1s), but expects no failures at step4.
hoping to hear from others reg this, at this point all conformance tests focus on correctness rather than performance, and this one would fail under the performance bucket imo.
I'm pretty reluctant to add any performance-based conformance test before GA, I think that this one in particular is very easy to get wrong, both from the conformance and the implementation sides. I also don't know if we want to mandate zero-packet-loss backend changes - that seems like it could be a high bar for some implementations. Maybe when we do add performance-style testing, we extend the testing result YAML to include the numbers (packets lost in this case)?
Again, I don't think we can move on @arkodg's suggestion before GA, sadly, there's too many sharp edges.
It would be good to discuss what guarantees implementors can provide during an update of an
xRoute
. Specifically in Knative we've seen ingresses drop traffic when transitioning from one backend to another - even when both backends have healthy endpoints.The situations I've seen this occur is when
1. Ingress doesn't track backend endpoints that aren't reachable via routes
You have two healthy backends A & B
When a route is updated from pointing to backend A to backend B the proxy has to page in backend B's endpoints. This unfortunately races against the route config being rolled out. Ingress drops traffic because endpoints aren't present and the route isn't pointing to backend A anymore.
2. Newly deployed healthy backends aren't safe for Routes reference
Even if an ingress solution tracks all endpoints there's a small window where traffic can be dropped when switching to a newly deployed backend C.
In Knative we have a test were we continuously probe a HTTPRoute while performing the following steps
The race here is the test observes and updates the HTTPRoute's backend faster than the ingress proxy can process/track the new endpoint.