Open glebiller opened 2 years ago
@glebiller Thank you for the PR.
I will review it this weekend and I will let you know on Monday morning about it. Thanks for your contribution.
@glebiller apologies that I could not reply earlier.
I understand that you would like to have the option to apply specific Affinity & Tolerations rules to specific StatefulSets.
Kubegres can replace a Primary Postgres StatefulSet by a Replica StatefulSet and it can also remove a Replica Postgres StatefulSet if it is identified as unavailable which will be replaced by a new Replica StatefulSet. This failover behaviour has for result the incrementation of the last instance index when assigning new names to the newly created StatefulSets.
For example, let's say we have a cluster of Postgres with 3 StatefulSets, with the names: mypostgres-1, mypostgres-2 and mypostgres-3.
The name "[postgres clustername]-[integer] has for instance index [integer]. In mypostgres-1 the instance index is 1.
If the Primary StatefulSet mypostgres-1 is unavailable it is replaced by a Replica StatefulSet mypostgres-2 which will be promoted as a new Primary. And a new Replica StatefulSet will be created. After the failover, the cluster will contains: mypostgres-2, mypostgres-3 and mypostgres-4.
The instance index 1 does not exist once the failover is completed.
If a Replica StatefulSet is unavailable, let's say mypostgres-3, then it will be replaced by a new Replica StatefulSet. After the failover the cluster will contains: mypostgres-2, mypostgres-4 and mypostgres-5.
The instance index 3 does not exist once the failover is completed.
Correct me if I am wrong, it seems like the array spec.nodeSets
is using its array index number to identify the instance index on which to apply the Affinity & Tolerations rules. If the failover use cases above happen, how do you keep track of the configuration since the instance indexes would change?
@alex-arica No worries for the delay :)
I understood the behavior you described before, and the Failover unit test was working because it was re-creating the missing Instance instead of incrementing the instance index to create a new Statefulset.
I just commit the second part of the changes that does the following:
Instance
type int32 by stringThat change should also help with #88 since it will keep the name of the StatefulSets organized, in addition to being able to change the configuration of each nodes Tolerations & Affinities separately.
Thank you for the update @glebiller
I will review it as soon as I can and let you know.
Hi, will this be merged somewhere in the future?
Currently, the Affinity & Tolerations are set for all the StatefulSets uniformly. This prevents forcing StatefulSets to a particular region or particular node.
This PR adds a new field
spec.nodeSets
that is mutually exclusive withspec.replicas
. This new field allows setting an Affinity or Tolerations to a particular node while keeping the a default in thespec.scheduler
.Added the required tests to cover the use cases.
NB: I am considering simplifying the logic so that the field
replicas
will generate a number of empty node in thenodeSets
to match the value. This will allow simplifying the reconciliation.