Open vishal-biyani opened 4 years ago
I'm proposing this way to scale down TServers, (initial implementation is in #17)
TargetedTServerReplicas: 3
(3 is the number to which user want to scale down to).
ScalingDownTServers: True
.ScalingDownTServers: True
on ybcluster resource.yb.com/blacklist: true
.ScalingDownTServers: True
or MovingData: True
LastTransitionTime
of ScalingDownTServers: True
condition.yb.com/blacklist: true
i.e. operation 4.
TargetedTServerReplicas: 3
.yb.com/blacklist: true
are in YB-Master's blacklistScalingDownTServers: True
.MovingData: False
and it is updated (at least 5 minutes?) after the status condition ScalingDownTServers
's update time. (This will make sure that we don't update the STS immediately and wait for the data move to start and it is reflected correctly on the ybcluster's Status)MovingData
's value reflects that.
True
if progress is not 100%, otherwise False
yb.com/blacklist: true
. If that is present, controller will make sure that this Pod's FQDN is in blacklist as well.yb.com/synced: true
on the Pod.I am presuming that this has not been implemented correct? What are the day2 actions that the operator currently facilitates? What is the intended roadmap of this operator vs the Rook Yugabyte functionality?
This issue is related to https://github.com/yugabyte/yugabyte-db/issues/4047, https://github.com/yugabyte/yugabyte-db/issues/4037, and conversation on slack around scaling of YugaByte cluster. The issue summarises use cases, possible solutions, and draft design of potential change to the YB operator.
There are some issues in K8S community which describe scenarios for more hooks than just
PreStop
andPostStart
, one of them which has links to some related discussions: https://github.com/kubernetes/kubernetes/issues/25275The limitation with
PreStop
hook is that the hook will be executed even in case of liveness probe failure, preemption, and resource contention.One of the other options discussed is about using VPA (Vertical Pod Autoscaler) - which re-creates the pods with lower resources assigned. This works with statefulset too but this is far more disruptive operation than scaling horizontally. Also, there are some details around StatefulSet and volumes provisioned in different AZs for HA which affect VPA operations. Reference. VPA is also a separate component to be run and managed within the cluster.
The recommended approach here is to enable this logic in the operator as every product has its way of dealing with it.
User Interface
When a user wants to scale up from 3 tablet servers to let's say 6 tablet servers, the specification of CR will change from source to target spec as shown below. The change can be done by a Git-based pipeline or by applying changed CR via Kubectl.
Source Spec:
Target Spec:
Changes to Operator
Currently, the update of a tablet server statefulset is applying the new specs but not doing anything additional to the database itself.
Reference: https://github.com/yugabyte/yugabyte-operator/blob/master/pkg/controller/ybcluster/ybcluster_update_controller.go#L63-L87
The overall update will involve a few phases - which later can be expanded to accommodate other use cases too.
phase
will be called and executedSpecifically for Scale Down operation as an example:
Testing & Acceptance
status
field of CR.Limitations/Future work
kubectl scale
approach is neither recommended for StatefulSets nor will work in this scenario. In the future we can potentially have a Kubectl plugin for runningybdamin
commands interactively. But the general practice is to apply such changes through a Git change so that the history is maintained and state of things in cluster and source code is consistent.References: