Initial multi-node options for yugabyte-db

WesleyW commented 5 years ago

From our discussion, the following is how preliminary multi-node will look for yugabyte-db:

New flags being added

--listen-addr LISTEN_ADDR: Public address on which yugabyte-db will listen. This is used for the following:
- The IP address the other process (yb-master/yb-tserver/yugaware) would use to talk to the current process/redirect to it.
--join HOST[:PORT]: Attempt to join a quorum with a yugabyte-db process on HOST, and a custom PORT if the standard ports are not used. Will fail with a graceful error message if yugabyte-db start was already run on this node (yugabyte-db seems to have been installed on this node. Run with the --force flag to destroy existing data).
--rf, --replication-factor REPLICATION_FACTOR: Universe replication factor. Must be set on the master-leader node. Default=1
--force: Destroy all data from an existing yugabyte-db install. Required before performing a join if node is already initialized. NOOP if already part of a universe.

How users would add nodes

Start yugabyte-db start --listen-addr LISTEN_ADDR on the first node. This is the master-leader. An optional --rf REPLICATION_FACTOR flag may be added if desired.
On the new node, run yugabyte-db start --listen_addr LISTEN_ADDR --join NODE1, where NODE1 is the LISTEN_ADDR from step 1. If the node already exists, add a --force flag to destroy existing data and join the cluster.
The "ready" status will display and users may pull up yugaware on either node to interact with the universe.

(1) translates to starting up yb-master with --master_addresses=NODE1 --rpc_bind_addresses=0.0.0.0 --server_broadcast_addresses=NODE1 and yb-tserver with same arguments (public ip NODE1 is not part of the local network interfaces)

How `--join` works

A --join must be done with an empty (e.g. no data directory) node.
If data already exists, the --join operation will fail. You can force it to delete the existing data and perform the --join operation by running yugabyte-db start --join N1 --force.

Assume we have nodes N1, N2 and N3. Assume we have started the cluster on the first node N1 by running yugabyte-db start --rf 3. To add N2 and N3 to this cluster, you would run yugabyte-db start --join N1 on these nodes.

The following steps would happen on N2 and N3:

Query N1 for list of the current master_addresses using the Rest API, which would currently return N1:7100. This is written as the master_addresses flag to the local yugabyte-db conf file.
Bring up yb-master in shell mode on N2.
Bring up yb-tserver will start up pointing to N1:7100 as the master_addresses.
Make a Rest API call to N1 to add the current node
Bring up yugaware, pointing to the local yb-tserver as the DB.

The add node Rest API on N1 would do the following:

Every add node, it checks if it has enough nodes for rf=3
Once N3 is added, it sees 3 nodes. Because we have 3 nodes, it will make change the replication factor from 1 to 3.
This involves the following steps:
- Change the master quorum from N1:7100 to N1:7100,N2:7100,N3:7100 using a yb-master change config API.
- Change the master_addresses flag on the three nodes to the above. Ideally this needs to move into the database and can no longer be a part of the conf file. Short term fix can be to have the yugabyte-db instances poll for these changes. This should be under a flag which can be turned off if needed.

TODOs

Removing a node should have its own command and should use yb-admin as well. Not necessary for first iteration.
yugabyte-db will callhome from every node, not just the master-leader. This may cause overlaps.
If N3 goes down, users may not access YW on N3's IP address.
The master_addresses flags needs to move into the database and can no longer be a part of the conf file.

iSignal commented 4 years ago

Three parts to this:

(1) Translate --listen_addr and --join_addr to corresponding master and tserver flags

NODE1: yugabyte-db start --listen=NODE1_ip

yb-master --master_addresses=NODE1_ip --rpc_bind_addresses=NODE1_ip --server_broadcast_addresses=NODE1_ip --replication_factor=1
yb-tserver --tserver_master_addrs=NODE1_ip ....

NODE_2: yugabyte-db start --listen=NODE2_ip --join=NODE1_ip

yb-master started in shell mode with no --master_addresses
yb-tserver --tserver_master_addrs=NODE1_ip:100,NODE2_ip

(2.1) When the second node is brought up, perform an ADD_SERVER on the master.

yb-admin --master_addresses=NODE1_ip change_master_config ADD_SERVER NODE2_ip 7100 (2.2) When the second node is brought up, add NODE2 to the yugaware imported universe via a YW API call - add_node_to_imported(new_node_ip)

(3) When a third node is added, do the same as above, plus add_node_to_imported(new_node_ip) would also issue a replication factor change that is the equivalent of yb-admin --master_addresses=NODE1_ip:7100 modify_placement_info <default_placement_info> 3 to change the replication factor to 3

Observations:

There's quite a few corner cases that would not work with this approach. For now, the goal is to get to a working 3 node cluster in the first cut of multi node and then iterate on it to improve the resilience.
We could just not start masters for nodes 4 and above. Or use shell masters once that side is more resilient.
Minor: In this scheme, tservers are only aware of masters on NODE1 and maybe, their own node. A v2 task would be for yugabyte-db to periodically collect the full list of master_addresses so it could start tservers properly with the full master list once they are all available. An alternative option is to fix this on the tserver side itself by persisting full master info once known. Masters already do this.

raarts commented 4 years ago

I was referred here by https://docs.yugabyte.com/latest/quick-start/create-local-cluster/docker/. It states multi-node clustering is not available yet. What does that mean? Is YB still constrained to one instance? What is the status of multi-node?

schoudhury commented 4 years ago

@raarts the above comment refers specifically to the yugabyted server which is a new "parent" server for the 2 yugabytedb servers (yb-tserver, the data server and yb-master, the metadata server).

yugabytedb can indeed be deployed as a multi-node cluster -- the following tutorials show you how. note that this approach shows how to create the multi-node cluster using a different approach than yugabyted. https://docs.yugabyte.com/latest/explore/linear-scalability/docker/ https://docs.yugabyte.com/latest/explore/fault-tolerance/docker/

yugabyte / yugabyte-db