reactive-tech / kubegres

Kubegres is a Kubernetes operator allowing to deploy one or many clusters of PostgreSql instances and manage databases replication, failover and backup.
https://www.kubegres.io
Apache License 2.0
1.32k stars 74 forks source link

Implement per node Affinity & Tolerations #102

Open glebiller opened 2 years ago

glebiller commented 2 years ago

Currently, the Affinity & Tolerations are set for all the StatefulSets uniformly. This prevents forcing StatefulSets to a particular region or particular node.

This PR adds a new field spec.nodeSets that is mutually exclusive with spec.replicas. This new field allows setting an Affinity or Tolerations to a particular node while keeping the a default in the spec.scheduler.

Added the required tests to cover the use cases.

NB: I am considering simplifying the logic so that the field replicas will generate a number of empty node in the nodeSets to match the value. This will allow simplifying the reconciliation.

alex-arica commented 2 years ago

@glebiller Thank you for the PR.

I will review it this weekend and I will let you know on Monday morning about it. Thanks for your contribution.

alex-arica commented 2 years ago

@glebiller apologies that I could not reply earlier.

I understand that you would like to have the option to apply specific Affinity & Tolerations rules to specific StatefulSets.

Kubegres can replace a Primary Postgres StatefulSet by a Replica StatefulSet and it can also remove a Replica Postgres StatefulSet if it is identified as unavailable which will be replaced by a new Replica StatefulSet. This failover behaviour has for result the incrementation of the last instance index when assigning new names to the newly created StatefulSets.

For example, let's say we have a cluster of Postgres with 3 StatefulSets, with the names: mypostgres-1, mypostgres-2 and mypostgres-3.

The name "[postgres clustername]-[integer] has for instance index [integer]. In mypostgres-1 the instance index is 1.

If the Primary StatefulSet mypostgres-1 is unavailable it is replaced by a Replica StatefulSet mypostgres-2 which will be promoted as a new Primary. And a new Replica StatefulSet will be created. After the failover, the cluster will contains: mypostgres-2, mypostgres-3 and mypostgres-4.

The instance index 1 does not exist once the failover is completed.

If a Replica StatefulSet is unavailable, let's say mypostgres-3, then it will be replaced by a new Replica StatefulSet. After the failover the cluster will contains: mypostgres-2, mypostgres-4 and mypostgres-5.

The instance index 3 does not exist once the failover is completed.

Correct me if I am wrong, it seems like the array spec.nodeSets is using its array index number to identify the instance index on which to apply the Affinity & Tolerations rules. If the failover use cases above happen, how do you keep track of the configuration since the instance indexes would change?

glebiller commented 2 years ago

@alex-arica No worries for the delay :)

I understood the behavior you described before, and the Failover unit test was working because it was re-creating the missing Instance instead of incrementing the instance index to create a new Statefulset.

I just commit the second part of the changes that does the following:

That change should also help with #88 since it will keep the name of the StatefulSets organized, in addition to being able to change the configuration of each nodes Tolerations & Affinities separately.

alex-arica commented 2 years ago

Thank you for the update @glebiller

I will review it as soon as I can and let you know.

kmiszta commented 1 year ago

Hi, will this be merged somewhere in the future?