Open shivangraina opened 1 month ago
Hi @shivangraina
For now we've used the https://docs.yugabyte.com/preview/manage/change-cluster-config/ page as a way to include them all. This shows how to change every node of the cluster (masters & tservers).
There are also some pages on https://docs.yugabyte.com/preview/troubleshoot/cluster/
The developer docs don't have much information for gracefully handling such scenarios that include the movement of data such as (replacing a failed node).
https://docs.yugabyte.com/preview/troubleshoot/cluster/replace_tserver/ & https://docs.yugabyte.com/preview/troubleshoot/cluster/replace_master/
For other scenarios (ex: node upgrade/patching) where the expectation is that the node will come back again quickly after performing the maintenance, we should add a step to blacklist leaders on this node.
I assumed the restart is very fast and best done in-place. The existing connections will be lost and retried automatically by the clients and work.
cc @hari90 ?
Description
For performing cluster balancing activities on the YugabyteDB cluster such as:
The developer docs don't have much information for gracefully handling such scenarios that include the movement of data such as (replacing a failed node). Here we should provide a step to blacklist a node to gracefully remove replicas before removing the node. For other scenarios (ex: node upgrade/patching) where the expectation is that the node will come back again quickly after performing the maintenance, we should add a step to blacklist leaders on this node. This will help in avoiding failures for inflight requests that are not being retried by the client due to the activity.
Warning: Please confirm that this issue does not contain any sensitive information