High availability mode - Githubissues

Currently operator restarts also restart the dataplane pods, which disconnects all active sessions. This is not desirable when running in high availability mode and the restart happens due to, e.g., a node failure.

Probably there are two, largely independent reasons for this.

First, the operator undergoes a shutdown cycle to invalidate Gateway API statuses before terminating. This must be disabled in high availability mode.

Second, the operator does not watch dataplane deployments, it just blindly recreates the deployments in every rendering cycle and relies on the Kubernetes controller runtime to silently ignore updates that don't change anything. This breaks for some reason when the operator is restarting.

This issue is to track progress on implementing high availability mode. A possible workflow could be to have a command line switch that would enable graceful operator restarts.

l7mp / stunner-gateway-operator

High availability mode #53