Open Piotreqsl opened 6 months ago
Zendesk ticket #3076 has been linked to this issue.
This issue has been marked as stale because of no activity in the last 180 days. It will be closed in the next 180 days unless it is tagged "no stalebot" or other activity occurs.
Gloo Edge Product
Enterprise
Gloo Edge Version
v1.19.3
Kubernetes Version
v1.27.8
Describe the bug
We have a problem with gloo gateway-proxy pods during our cluster upgrades.
Our application needs to handle many websocket connections and ensure each of them is served properly. We've implemented custom PreStop hooks in our pods and it works fine during normal upgrades (e.g. changing docker tag) or HPA scaling - Pod waits till last of websocket connection is terminated or till specified timeout.
The problem occurs when we try to upgrade whole cluster (e.g. change type of EC2). We observed that gateway-proxy pods are being killed during upgrade and because of that - connections to our application pods are being terminated, due to connection loss.
We've read https://docs.solo.io/gloo-edge/latest/operations/advanced/zero-downtime-gateway-rollout/ documentation, that says something about health-checks, but we're not sure how would it help us in keeping long-living connection (our pods are in terminating state)
We'd kindly ask for advices how we can configure gloo gateway-proxy in order not to kill existing connections.
Expected Behavior
Gloo gateway-proxy should wait for last connection to terminate or timeout after specified time.
Steps to reproduce the bug
Additional Environment Detail
No response
Additional Context
No response