Closed philipgough closed 1 year ago
Thanks @matej-g - I'll fix up based on your suggestions but I also think the concept of disruptions (not distributions, thanks autocomplete) is a known and well document concept in Kubernetes. See https://kubernetes.io/docs/concepts/workloads/pods/disruptions/ for example. PodDisruptionBudgets then are named accordingly because they respond to the respective budget.
So I think in the end we have three things to reason about:
That is why I named the flag as I did, because it supports the removal of replicas irregardless of cause.
Thanks for the explanation @PhilipGough, I stand corrected đŸ™‡, first time I'm learning about this. It still feels though we're really operating on pod status rather than disruption (we don't know what is going on with pod(s) and whether this is (in) voluntary disruption, since we only know pod's status). But since the concept does exist and is well understood, my point is even less important.
This change allows the user, behind a flag, to provide an actual real world view of the replicas that exist in an operable state within the hashring.
A previous comment warned about the consequences of adjusting the hashring during scale down events. However, this view only makes sense and works under the assumption that the disruption is temporary or unintended. We believe there are some benefits to supports this behaviour:
Reducing error rate during update events
Configuring this flag allows the operator to provide a realistic view of the hashring and from testing, we have seen that during voluntary/involuntary disruptions, without this flag, traffic quickly builds up under load to unavailable or non-existent receivers and can rapidly make the situation hard to overcome. This flag does not entirely prevent that, but we have also witnessed that during voluntary disruptions (rollouts/updates), our error rate is much lower.