redhat-cop / keepalived-operator

An operator to manage VIPs backed by keepalived
Apache License 2.0
117 stars 35 forks source link

spread VIPs across instances #84

Open aneagoe opened 2 years ago

aneagoe commented 2 years ago

PR #53 already addresses this concern, but only partially. Its impact is, unfortunately, quite limited. It's quite rare that a service would have multiple VIPs such that they would be balanced. The current shortcoming is that if one deploys keepalived-operator on OKD/Openshift 4.x, most VIPs would be colocated on a single node. This is especially problematic for services that require quorum, like kafka or mongodb with multiple instances. It's also affecting traffic scalability, as all traffic is mostly handled by a single node. Ideally, by default, keepalived-operator would try to randomize the VIP distribution across the members. Further, it would be great if based on an annotation, some VIPs are kept on different members (for example, create an "anti-affinity" for kafka broker services, such that they are not placed on the same node).

raffaelespazzoli commented 2 years ago

If you can tell me how you would solve the problem with a direct Keepalived deployment, I might be able to support this request with this operator. Right now, keepalived decides where to publish the VIPs.

On Tue, Dec 7, 2021 at 8:38 AM Andrei Neagoe @.***> wrote:

PR #53 https://github.com/redhat-cop/keepalived-operator/pull/53 already addresses this concern, but only partially. Its impact is, unfortunately, quite limited. It's quite rare that a service would have multiple VIPs such that they would be balanced. The current shortcoming is that if one deploys keepalived-operator on OKD/Openshift 4.x, most VIPs would be colocated on a single node. This is especially problematic for services that require quorum, like kafka or mongodb with multiple instances. It's also affecting traffic scalability, as all traffic is mostly handled by a single node. Ideally, by default, keepalived-operator would try to randomize the VIP distribution across the members. Further, it would be great if based on an annotation, some VIPs are kept on different members (for example, create an "anti-affinity" for kafka broker services, such that they are not placed on the same node).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/redhat-cop/keepalived-operator/issues/84, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPERXB7HHUZQTW2NT4GSHLUPYE6HANCNFSM5JRI3JVA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- ciao/bye Raffaele

aneagoe commented 2 years ago

I'm not sure what's the best way to approach this and I'm not very familiar with keepalived. However, the difficulty seems to be in how to manage/mount the keepalived configmap which holds the config. Say we have 3 VIPs and two replicas for the daemonset. Then, one node would be MASTER for 2 VIPs and one for 1 VIP. Now, if the daemonset is scaled to 3, the operator would reconfigure replicas such that each VIP is MASTER on a different node. When it comes to the "anti-affinity" approach I mentioned earlier, the operator would try to ensure that VIPs associated with services labeled/annotated with a particular key would be spread across keepalived instances. High level objectives would be to achieve a best-effort uniform spread of VIPs and provide some functionality to control which VIPs should not be placed on the same underlying node.

raffaelespazzoli commented 2 years ago

I get the concept. As of now, each instance of the daemonset mounts the same configmap. So, at start none of the nodes is the master for a given VIP (or all of them are, I don't remember exactly, but it does not matter). Then Keepalived runs a leader election round and a master is elected. There must be a reason why you see the same node winning all the leader elections and getting all of the VIPs. I don't know what it is (it could be a problem with your environment: for example a node is faster than the others) and not sure if this behavior can be governed via keepalived configuration. Would you be willing to do this research?

On Tue, Dec 7, 2021 at 11:05 AM Andrei Neagoe @.***> wrote:

I'm not sure what's the best way to approach this and I'm not very familiar with keepalived. However, the difficulty seems to be in how to manage/mount the keepalived configmap which holds the config. Say we have 3 VIPs and two replicas for the daemonset. Then, one node would be MASTER for 2 VIPs and one for 1 VIP. Now, if the daemonset is scaled to 3, the operator would reconfigure replicas such that each VIP is MASTER on a different node. When it comes to the "anti-affinity" approach I mentioned earlier, the operator would try to ensure that VIPs associated with services labeled/annotated with a particular key would be spread across keepalived instances. High level objectives would be to achieve a best-effort uniform spread of VIPs and provide some functionality to control which VIPs should not be placed on the same underlying node.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/redhat-cop/keepalived-operator/issues/84#issuecomment-988065311, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPERXCGINSVSDSN5OQPR5TUPYWEVANCNFSM5JRI3JVA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- ciao/bye Raffaele

aneagoe commented 2 years ago

I'm already looking at options. Regardless of the bootstrap mechanism, if one goes through maintenance and does a rolling reboot of nodes, you'd end up with a situation where all VIPs are handled by a single node it seems. Using track_file seems interesting, but atm I'm not sure what the simplest approach would be, without requiring complexity on the operator side. I'm still experimenting with that, I'll post back once I have something more concrete.