High availability loss caused by simultaneous HAProxy configuration changes

zimnx commented 1 year ago

When a HAProxy configuration change is required and there are active connections, the HAProxy waits until all connections are closed before initiating a restart with the hard-stop-after timeout. This behavior becomes problematic in environments where multiple HAProxy ingress controllers are watching the same Kubernetes resources, such as Services and Ingresses. When an update of these is observed, all instances of the HAProxy ingress controller detect the configuration changes simultaneously.

This leads to a situation where the hard-stop-after timeout is effectively triggered by each instance. Consequently, all active connections are closed simultaneously, causing significant availability issues.

Steps to reproduce:

Set up an environment with multiple HAProxy ingress controller instances.
While there are active connections, update one of the Ingresses in a way that it requires a HAProxy configuration change.

Expected behavior:

The HAProxy ingress controllers should coordinate the configuration rollout in a way that prevents simultaneous restarts by multiple instances. This coordination should ensure a smooth transition without abruptly closing all active connections at once.

Actual behavior:

The current behavior leads to simultaneous HAProxy restarts due to the detection of configuration changes by all HAProxy's at once. This results in the hard-stop-after timeout being reached by each instance, causing all active connections to be closed concurrently.

ivanmatmati commented 1 year ago

Hi @zimnx , the controllers are all indepedent and have no means to coordinate. It's intended that they react simultaneously but anyway keep in mind that a transaction is actually a time window for change set to 5 secs s by default. I guess that all the instances are not really started at the same millisecond so there will always be a period where one has already started the new configuration while an other one has still the same configuration. Maybe you can tweak the sync-period parameter with different values for different controllers to minimize the risk you're talking about.

fabianonunes commented 1 year ago

@zimnx, since reloads are seamless, you could use different hard-stop-after for each instance ~or even disable it to prevent concurrent connection interruptions~ (bad advice).

tnozicka commented 1 year ago

It seem like all the tips are meant at just making it less likely to happen but how can users make sure it's always HA? Should something coordinate the restarts? (live reloads should be fine).

the controllers are all indepedent and have no means to coordinate.

I assume they could sync using the kube api, that all of them are connected to. It kind of seems like the issue relates to doing a supervisor that restart the app within the container, while the other HA apps let Kubernetes rollout the change through a workload controller that respects HA and PodDisruptionBudgets so they don't have to do it on their own.

@zimnx, since reloads are seamless, you could use different hard-stop-after for each instance

@fabianonunes How would you do this in production? Given you had 1 Deployment with 5 replicas wouldn't that mean 5 Deployments with 1 replica to have separate configs?

or even disable it to prevent concurrent connection interruptions.

What happens if it is disabled? Will the old processes pile up and OOM at some point?

If you use hard-stop-after, with enough changes coming in frequently (Services or Ingresses being created / updated) can enough old processes pile up so the Pod gets OOM?

hdurand0710 commented 1 year ago

Hi @zimnx , As @ivanmatmati said, the controllers are all indepedent and have no means to coordinate. To help and have it less likely to happen, we could add another annotations hard-stop-after-random with the following behavior:

if only hard-stop-after is set, no change
if hard-stop-after-random is set as well as hard-stop-after, when we generated the haproxy.cfg , we add a random time (between 0 and hard-stop-after-random to the hard-stop-after time.

That would not solve everything, but would help, and allow to have only 1 deployment with several replicas but different values.

How would you like this option ?

zimnx commented 1 year ago

This would be a workaround, which brings undeterministic behavior with a chance that all nodes will restart at once even when this parameter is set. In a long run, chance of hitting it might be significant.

I think we could live with it temporarily, but issue still persists. Config shouldn't be rolled out simultaneously on all nodes.

oktalz commented 12 months ago

hi @zimnx

which brings undeterministic behavior

If I'm not mistaken deterministic behavior that we have with ingress is not good, but at the same time proposed non deterministic one is also not good. But there is no third option, either you are or you are not deterministic.

in general if you have 5 replicas then they are exactly that, replica, duplicate of each other that behaves and acts on the changes in same manner.

Should something coordinate the restarts

yes, but the answer is leading me to inform you about our different products on Enterprise side, specifically called Fusion that can solve that issue.

Following this conversation and proposed solution it seems that there is a room to improvement, but if we bring randomness to the table, more people might be confused, so I'm going to for now put this on hold, in future with adding potentially new features to HAProxy we might have different options even.

haproxytech / kubernetes-ingress