Dynamic NATS cluster (clustering using single DNS record)

abhirockzz commented 5 years ago

Feature Request

Use Case

Currently, NATS cluster configuration is 'static' in nature - one needs to use explicit IP address using CLI or config file. Allow dynamic NATS cluster formation to allow for 'true' elastic scalability (especially in cloud environments)

Proposed Change:

In addition to static cluster configuration

please add provide the ability to plug-in additional discovery mechanism along with required configuration parameters
expose this using a SPI so that custom implementations can be developed

Some implementations can be provided by default e.g. Kubernetes, Docker Swarm etc.

Who Benefits From The Change(s)?

Anyone deploying/managing NATS in a dynamic/cloud environment where the underlying infra (VM, container etc.) is ephemeral

By having a pluggable cluster node discovery mechanism, it's possible to

easily run in different cloud providers by switching discovery mechanism (via config)
use different service discovery backends (cloud or on-prem) e.g. zookeeper, consul, etcd etc.

Example

Hazelcast Discovery SPI is a good example. It provides static and multicast based cluster discovery out-of-the-box and exposes others via its SPI which has multiple implementations

wallyqs commented 5 years ago

Besides static config the server has cluster auto discovery mechanisms so that as you add nodes to your topology, the servers become aware of other nodes to form the mesh dynamically.

Cluster Options:
        --routes <rurl-1, rurl-2>    Routes to solicit and connect
        --cluster <cluster-url>      Cluster URL for solicited routes
        --no_advertise <bool>        Advertise known cluster IPs to clients
        --cluster_advertise <string> Cluster URL to advertise to other servers
        --connect_retries <number>   For implicit routes, number of connect retries

abhirockzz commented 5 years ago

This is what I am referring to as static e.g. if node1 is started as follows gnatsd -p 4222 -cluster nats://node1:4248, how will it be possible to the system to seamlessly scale by adding a second node ? Having knowledge of the seed node (node1 in this case) is compulsory so that the second node can be started as such gnatsd -p 4222 -cluster nats://node2:4248 -route nats://node1:4248

For subsequent nodes, we can continue to refer -route nats://node1:4248 (but the failure of node1 can affect further scale out) or use all the previously started nodes (same problem of having to know all the nodes)

Same thing goes for scaling down...

I was thinking of an auto-pilot mode where nodes would form a cluster based on the discovery implementation.

derekcollison commented 5 years ago

There are many ways to do this, but essentially it is how you define the seed node. It could be DNS, even if the node is moving, could be multiple entries for a single DNS. You could even use your own version of resolving to an IP for the seed node for new servers to come online. Any new server just needs to connect to one server in the existing cluster and everything else will be automatic.

zerjioang commented 5 years ago

a simple approach to use in those situations in where nodes are in same LAN/Datacenter private LAN, could be to use an SSDP style protocol for automatic discovery. It is just a simple multicast UDP communication.

derekcollison commented 5 years ago

Nothing that involves multi-cast with multiple or unknown number of network elements is simple.

However, if it works feel free to utilize.

NATS' clusters can self heal and dynamically change with the concept of a seed node. This seed node should in fact be a hostname that can resolve to as many IPs as you want to have high availability, using any mechanism applicable for the job. NATS will do the rest from there in terms of dynamic cluster changes and client updates.

mjrist commented 2 years ago

@derekcollison I'm also looking for more dynamic way to cluster NATS servers (running on Kubernetes). I want to be able to run one server per node (Kubernetes daemonset) as a NATS cluster. My cluster size can change. Ideally I would avoid the necessity of a predefined seed server(s). Also a static list of cluster routes in the config doesn't work as my cluster doesn't have a predefined node count.

This seed node should in fact be a hostname that can resolve to as many IPs as you want to have high availability.

I tried using a hostname that resolves to the IPs of all existing NATS servers (using a headless service in Kubernetes): Example config:

    cluster {
      port: 6222
      routes [
        nats://nats-route.default.svc.cluster.local:6222        
      ]

DNS lookup:

# host nats-route.default.svc.cluster.local 10.43.0.10
Using domain server:
Name: 10.43.0.10
Address: 10.43.0.10#53
Aliases:

nats-route.default.svc.cluster.local has address 10.42.4.14
nats-route.default.svc.cluster.local has address 10.42.0.22
nats-route.default.svc.cluster.local has address 10.42.3.16

There are three problems:

On resolving the hostname, NATS appears to use only one of the addresses returned to attempt the cluster connection.
In my case there is no distinction between seed server and non-seed server. The IPs of all server instances are returned when looking up nats-route.default.svc.cluster.local. If the new NATS server that is attempting to join the cluster uses it's own IP, it will not join the cluster.
Even if the NATS server attempting to join did not use it's own IP, you can end up with segmented clusters.

When attempting route connections, if NATS simply used all IPs returned from the DNS lookup, the problems listed would be solved. This would allow for dynamic clusters relying only on DNS for route discovery.

Is it possible to make this change to NATS?

(Note: seems similar to this issue for the client https://github.com/nats-io/nats.js/issues/249)

derekcollison commented 2 years ago

Good question for @wallyqs

wallyqs commented 2 years ago

Maybe we could have some periodic timer to check the results of the DNS name could help so that it eventually converges, something like that I think would help simplify the clustering to use a single entry name.

mjrist commented 2 years ago

Yes that would be great!

Right now it looks like it just attempts the first IP. So as I understand it, it would need to:

Attempt all IPs returned as routes.
Like you say - periodically recheck DNS as the set of IPs can/will change.

NOBLES5E commented 1 year ago

Would love to see this. Any plan on this feature? @wallyqs

theoilie commented 1 year ago

Quick bump and question here for @wallyqs - I really agree with point 2 from @mjrist. It would be really great to have out-of-the-box support for having no distinction between seed server and non-seed server. Is this something that's already possible without any complex custom logic?

maxpert commented 11 months ago

I've been waiting for this for a while now. I threw the towel because it was choking Marmot from the ease of peer discoverability via DNS in environments like fly.io, I merged and tested in production a basic A/AAAA record lookup + flattening mechanism today. I am more than happy to generate a PR for NATs upstream as well, because the moment I used it I realized how great of connivence this is. Right now I've explicitly opted to scheme of dns to isolate the NS lookup part, but one can easily imagine this can be applied to nats://... as well. I can totally see some people wanting to support TXT records for this purpose as well, but that's an evolutionary change.

mjrist commented 11 months ago

@maxpert this is great. I would love to see this in NATS!

In our case the NATS cluster can change dynamically - servers can be added and removed or the IPs can change. It would be great if the peer discovery could run periodically (at a configured interval) instead of just at startup. If the peers change at any point, the config can be hot-reloaded.

maxpert commented 11 months ago

@maxpert this is great. I would love to see this in NATS!

In our case the NATS cluster can change dynamically - servers can be added and removed or the IPs can change. It would be great if the peer discovery could run periodically (at a configured interval) instead of just at startup. If the peers change at any point, the config can be hot-reloaded.

Funny you mentioned! I am in middle of that PR, just dealing with few annoyances.

Edit: I had to deal with some additional stuff but I am testing my changes, and have a PR in place.

maxpert commented 11 months ago

Seems like there is a deeper issue that needs to be solved. I am getting error:

config reload not supported for jetstream storage dir

I believe we need a different solution here. Turns out right now new nodes coming up will automatically talk to other nodes and have themselves recognized as members via gossip. The only problem left for Marmot is to remove nodes that are gone so that logs are not populated with dead nodes contact errors.

nats-io / nats-server