redhat-cop / keepalived-operator

An operator to manage VIPs backed by keepalived
Apache License 2.0
118 stars 36 forks source link

Fix spreadvips bug that leads to KeepalivedGroup unable to start #58

Closed tommasopozzetti closed 3 years ago

tommasopozzetti commented 3 years ago

The introduction of the spreadvips annotation introduced a bug where if the KeepalivedGroup CR is deployed while some services already exist targeting it and they all specify the spreadvips annotation, the keepalived pods can fail to start.

This is due to the template using the modulus function to assign the preferred owners among the existing keepalived pods. If, however, no keepalived pods exist yet, due to the KeepalivedGroup being new, the template cannot be templated since the modulus function would lead to a division by 0 (the length of the array containing the names of the keepalived pods). Unfortunately, the template is also the one creating said pods, therefore the KeepalivedGroup will remain deadlocked forever.

This PR fixes the issue by defaulting to the old non-spread algorithm if no pods are yet available. This allows the template to be generated, which creates the keepalived pods, which in turns triggers a reconciliation that will correctly switch to the spreadvips configuration.