[Docs] Lack of documentation for safely performing cluster balancing activities

yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.

Other

8.7k stars 1.04k forks source link

Description

For performing cluster balancing activities on the YugabyteDB cluster such as:

Scale-out
Scale-in
Node upgrades
Replacing a failed node

The developer docs don't have much information for gracefully handling such scenarios that include the movement of data such as (replacing a failed node). Here we should provide a step to blacklist a node to gracefully remove replicas before removing the node. For other scenarios (ex: node upgrade/patching) where the expectation is that the node will come back again quickly after performing the maintenance, we should add a step to blacklist leaders on this node. This will help in avoiding failures for inflight requests that are not being retried by the client due to the activity.

Warning: Please confirm that this issue does not contain any sensitive information

[X] I confirm this issue does not contain any sensitive information.

Hi @shivangraina

For now we've used the https://docs.yugabyte.com/preview/manage/change-cluster-config/ page as a way to include them all. This shows how to change every node of the cluster (masters & tservers).

There are also some pages on https://docs.yugabyte.com/preview/troubleshoot/cluster/

The developer docs don't have much information for gracefully handling such scenarios that include the movement of data such as (replacing a failed node).

https://docs.yugabyte.com/preview/troubleshoot/cluster/replace_tserver/ & https://docs.yugabyte.com/preview/troubleshoot/cluster/replace_master/

For other scenarios (ex: node upgrade/patching) where the expectation is that the node will come back again quickly after performing the maintenance, we should add a step to blacklist leaders on this node.

I assumed the restart is very fast and best done in-place. The existing connections will be lost and retried automatically by the clients and work.

cc @hari90 ?

yugabyte / yugabyte-db

[Docs] Lack of documentation for safely performing cluster balancing activities #22980

Description

Warning: Please confirm that this issue does not contain any sensitive information