nats-io / nats-architecture-and-design

Architecture and Design Docs
Apache License 2.0
177 stars 20 forks source link

Allow for user customized handling via callback or language specific mechanism #107

Open scottf opened 2 years ago

scottf commented 2 years ago

Overview

OPTIONALLY.... Provide some mechanism for the user to override providing the list of urls used for connecting or reconnecting to servers.

The user would provide a server url list or some way to iterate the list that matches how the client currently goes through the possible list.

As examples, the Java and .NET client refactored their own specific server list handling into an interface, a default implementation, and then provided a way in the Options for the user to provide their own implementation.

Parity Notes

This is not strictly required for parity. It's a nice to have, so can wait until a customer / user asks for it.

Clients and Tools

Other Tasks

Client authors please update with your progress. If you open issues in your own repositories as a result of this request, please link them to this one by pasting the issue URL in a comment or main issue description.

Original Text

Provide the ability to bootstrap the client connection with multiple lists of servers, representative of different regions. This would be useful in the case where clusters are deployed in multiple regions and clients would prefer to connect to the closest region (first list) always unless it fails on all servers in that list / server info at which time it would try from the second list of servers unless it fails all those and then would go to the third list.

For example, consider 3 regions east, central and west. E East Server List [a.b.x.1, a.b.x.2, a.b.x.3] C Central [a.b.y.1,a.b.y.2,a.b.y.3] W West[a.b.z.1,a.b.z.2,a.b.z.3]

The east clients would be configured with these 3 lists in the order of E, C, W but a west client would be configured in W,C,E order.

When connecting, the client would exhaust the first list before trying any in the second list.

caleblloyd commented 2 years ago

Is this better managed out of band? There are lots of considerations here- upstream health checking, resolving DNS names at a specified interval, failing over, failing back, etc.

It would be fairly straightforward to implement using Envoy with priority levels:

https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/load_balancing/priority

ColinSullivan1 commented 2 years ago

imoI for the NATS clients, simpler is better, and for some clients we could add a callback that's invoked to get the next url for customized server selection in connect/reconnect. Longer term we've discussed a high level service/stream API that can could be much more sophisticated with the features @caleblloyd suggested.

ripienaar commented 2 years ago

I am very keen on something like the callback Colin mentions.

For me the problem is I get initial list from elsewhere - SRV records, consul etc - and people might want move my clients to another cluster. So they update eg. SRV records but there is no way to rerun a query or update a running client.

I need to periodically or, less ideal, on reconnect be able to update server list on very long running clients I do not directly control.

aricart commented 2 years ago

If anything this should be callback that replaces the cluster gossip behaviour. That means that if you want to specify a callback the expected behaviour is that cluster updates are ignored (as the authoritative server list is now the responsibility of the callback)- the obtaining of the list could be an expensive situation, and in some cases possibly affected by the same network outage that is requiring the services to use a different cluster.

scottf commented 2 years ago

+1 for callback. Will allow custom implementations, including for thing like specific dns resolution

marthaCP commented 1 year ago

Not a feature

aricart commented 1 year ago

@marthaCP why is this reopened?

marthaCP commented 1 year ago

I meant to update the title for the issue. Scott said it was still open. Maybe we should discuss at the call tomorrow (11/9/22).