Running some tests locally I see the envoy containers restarting around the time 503 errors occur
Note: envoy returns the following status codes
503/unknown cluster => there's no backing endpoints (link)
404/no cluster match for url => there's not backing (link)
The envoy pods restart because liveness probes fail due to timeout. Digging a bit it looks like we don't have resource requests/limits set on our envoy daemon set pods.
We should probably create an ytt overlay to add these requests/limits in order to reserve the CPU:
It looks like our networking conformance tests have been pretty flakey https://testgrid.k8s.io/r/knative-own-testgrid/net-contour#continuous
Running some tests locally I see the envoy containers restarting around the time 503 errors occur
Note: envoy returns the following status codes
503/unknown cluster => there's no backing endpoints (link) 404/no cluster match for url => there's not backing (link)
The envoy pods restart because liveness probes fail due to timeout. Digging a bit it looks like we don't have resource requests/limits set on our envoy daemon set pods.
We should probably create an ytt overlay to add these requests/limits in order to reserve the CPU:
The overlays are applied here: https://github.com/knative-sandbox/net-contour/blob/2db64e2d558a32fe550e1f478bf0ce76e98a3673/hack/update-deps.sh#L56-L68