Open dangarthwaite opened 1 year ago
currently pomerium uses Service Endpoints
object, that is updated once Pods are terminated or new ones become Ready. The update takes a bit of time which is the root cause of the downtime.
One current option to avoid the short downtime window should be to use kubernetes service proxy instead, see https://www.pomerium.com/docs/deploying/k8s/ingress#service-proxy
in the long term, we probably we should start using a newer EndpointSlice object that takes the pod conditions into the consideration. https://kubernetes.io/docs/concepts/services-networking/endpoint-slices/#conditions
What happened?
Doing a rollout restart of the verify service results in a small window of downtime.
What did you expect to happen?
rollout restarts of a targeted application should have zero failed requests.
How'd it happen?
What's your environment like?
What's your config.yaml?
What did you see in the logs?