Open muff1nman opened 4 years ago
@muff1nman Thanks for bringing your need to our attention. We have been discussing it internally but haven't found any solution yet -- that every pod should be able to discover every pod directly in the same StatefulSet is how Kubernetes works and that is exactly we have designed YugabyteDB to work. Given that there is no global DNS (that can handle such pod-to-pod discovery) across multiple Kubernetes clusters, we had found that our documented approach was the simplest way to get such a global DNS.
Couple of questions:
cc @rkarthik007 @iSignal
Hi @muff1nman,
The two key requirements to getting YugabyteDB across multiple k8s clusters are:
A few questions:
I would rather use a deployment with LoadBalancer type services to assign dedicated ips to endpoints that need to be exposed across clusters.
YugabyteDB expects almost all nodes (pods in the case of k8s) to be able to communicate with one another using IP addresses (over TCP). This is true for both master and tserver pods. Typically, we have observed many deployments are not able to stand up as many load balancers (one LB exposing each pod in the cluster to each other). Is this something that you would be able to achieve in your environment?
Also, are you thinking about using the node port for this?
Are you running on GKE/Google Cloud or your own private data center?
Private datacenter on bare metal.
Would a VM based deployment be an option so that we can brainstorm a solution w/o Kubernetes as an additional layer of complexity?
VM deployments are not an option.
YugabyteDB expects almost all nodes (pods in the case of k8s) to be able to communicate with one another using IP addresses (over TCP). This is true for both master and tserver pods. Typically, we have observed many deployments are not able to stand up as many load balancers (one LB exposing each pod in the cluster to each other). Is this something that you would be able to achieve in your environment?
The amount of load balancer ips is not a limitation. For a minimal install, I figured at least three ips were needed for the masters, and then potentially another three for the tservers.
Also, are you thinking about using the node port for this?
Node ports are not an option.
The amount of load balancer ips is not a limitation. For a minimal install, I figured at least three ips were needed for the masters, and then potentially another three for the tservers.
Got it, this makes sense and would work even currently. Could you please try using server_broadcast_addresses
with both yb-master
and yb-tserver
?
The
server_broadcast_addresses
parameter is used to specify the public IP or DNS hostname of the server.
In the case of running inside k8s, this is the load balancer address. An example is shown in this blog post on using DNS names for communication between nodes/pods. The reference in docs is here.
Note that this requires you to know the LB IP address (or dns if it exists) ahead of time to enable stable identities for the various masters. Does this work?
Could you please try using
server_broadcast_addresses
with bothyb-master
andyb-tserver
?
That is what I tried doing as seen in the manifests above. The addresses resolved to the corresponding LB ips. It does not work due to the aforementioned check to ensure the bind address matches the broadcast address (which will not happen with LoadBalancer Services).
Got it, thanks for patiently working through this with me @muff1nman - I do see now that you were a bunch of steps ahead in your question :)
In this snippet you posted above:
- "--server_broadcast_addresses=yb-master-blue.example.com:7100"
- "--rpc_bind_addresses=0.0.0.0:7100"
- "--master_addresses=yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100"
This is the right thing to do, because the rpc_bind_addresses
parameter expects to resolve to a network interface to bind and listen to for the rpcs. If you just want it to bind on a particular interface, you would need something like:
- "--server_broadcast_addresses=yb-master-blue.example.com:7100"
- "--rpc_bind_addresses=$(POD_IP):7100"
- "--master_addresses=yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100"
- "--server_broadcast_addresses=$(HOSTNAME).yb-tservers.$(NAMESPACE).svc.cluster.local:9100"
- "--rpc_bind_addresses=$(HOSTNAME).yb-tservers.$(NAMESPACE).svc.cluster.local"
- "--cql_proxy_bind_address=$(HOSTNAME).yb-tservers.$(NAMESPACE).svc.cluster.local"
- "--enable_ysql=true"
- "--pgsql_proxy_bind_address=$(POD_IP):5433"
I believe that while the server_broadcast_addresses
is specified correctly, the rpc_bind_addresses
and cql_proxy_bind_address
would need to be either 0.0.0.0:9100
or $(POD_IP):9100
depending on what you are trying to achieve.
Please let me know if that works for you. Also, I do think this needs to be documented better. Could you please open a github issue for that?
cc @iSignal @bmatican
Yes I imagine the tserver manfiest needs some work, however, I couldn't even get the master pods to stay up and establish a quorum. I'm not very familiar with the architecture yet, but I imagine the masters being in quorum is a prerequisite for the tservers to function?
@muff1nman : yes, we should focus on getting the master quorum to work before working on tservers.
Building what Karthik suggested above, I would suggest the following command line flags for the master pods. Please let me know how this works:
Set --rpc_bind_addresses=$(POD_NAME).yb-masters.$(NAMESPACE).svc.cluster.local:7100
This is the default setting in our helm charts and yaml files (see https://raw.githubusercontent.com/yugabyte/yugabyte-db/master/cloud/kubernetes/yugabyte-statefulset.yaml). You can also use POD_IP - we typically don't because it is not stable across pod movements/restarts.
Set --server_broadcast_addresses=load_balancer_ip:7100
(or DNS that resolves to it)
Set --use_private_ip=zone
. This assumes that your master pods are in multiple zones.
Set --master_addresses="{private1:7100,public1:7100},{private2:7100,public2:7100},..."
where private1 is the same as the rpc_bind_addresses parameter in step (1) and public1 is the load_balancer_ip specified in step (2).
Make sure you delete PVCs associated with any older runs each time you try different values for these parameters.
@muff1nman were you able to try the changes @iSignal has suggested in the previous comment? would be great if you could let us know if you are still blocked.
The multi-K8S example deployment on GKE uses cross pod traffic between clusters. This is an atypical/nonstandard behavior of multiple K8S clusters. I would rather use a deployment with LoadBalancer type services to assign dedicated ips to endpoints that need to be exposed across clusters.
I have tried this with the following modified manifiest (modified accordingly for each cluster):
However the master is unable to startup with the following snippet from the logs:
Looking at the code: https://github.com/yugabyte/yugabyte-db/blob/e4f4e6f4f77d4507b3ec53c5b51cd4886bb7206b/src/yb/master/catalog_manager.cc#L1244-L1274 it looks like this is due to the fact that the rpc address isn't in the list of resolved master addresses. However this is expected for a LoadBalancer type setup as the master addresses will resolve to the LoadBalancer ip while the container will not be able to listen on that ip address (hence why I've used
0.0.0.0:7100
.Can this check be optionally skipped? Will it impact logic further down the line?