Closed ajax-koval-d closed 1 month ago
By default it's shuffled and by default on reconnect, the current server is put on the bottom after shuffle.
As it turns out, you can provide your own complete ServerPool implementation and pass it to Options via the serverPool(...)
builder method.
If you provide your own implementation, you can extend the NatsServerPool as a starting point. If you make changes that work for you, you are welcome to offer a PR with changes for the system. If you do that, please make sure that the current behavior is the default and you can only modify the behavior in the default implementation with connection Options. The other thing I suppose we could do is provide a full different implementation in our Orbit extension library.
@scottf, Thanks for quick response!
I'd like to clarify a bit my idea: what I want is to call Collections.shuffle
in a middle of a forceReconnect
, right before reconnectImpl
, but without putting current server on the bottom, like this. I don't want to change NatsServerPool
or reconnectImpl
logic in any way, so standard behavior of reconnect, when connection, for example, disconnected due to network issues will be kept intact - currently connected server will be last on the list.
I believe in that case there's no need to introduce different option, because it will only affect forceReconnect
call, and to me it seems strange to opt in for unfair distribution here. But if you feel strong about making a separate option, I can provide new implementation of ServerPool
and add a new method, shuffle
here, which will be NOOP in NatsServerPool
.
In any case, I'd be glad to contribute.
By the way, at first I thought about extending NatsServerPool
in my project, but unfortunately, listLock
is private in here, and getServerList
returns a copy, making it essentially impossible to reuse that class.
Ok here is the change to NatsServerPool to make it easier to extend. https://github.com/nats-io/nats.java/pull/1185
Observed behavior
After nats cluster upgrade, we end up with nats connections distributed unevenly between different nodes. To fix this, we want to use
forceReconnect
.Problem here is that node resolution in
forceReconnect
is not fair, which becomes quite evident in cluster with two nats server nodes. Nats cluster upgrade leaves all connection hanging on one node, and then after triggeringforceReconnect
on each connection, all connection just reconnect on another node, instead of distributing evenly.As I understand, reason for this is that currently connected server is guaranteed to be at the end of list in a
NatsServerPool
, meaning, that client will not try to reconnect to the currently connected server.To fix this I'd propose to shuffle list in
NatsServerPool
inside offorceReconnect
implementation.Expected behavior
After force reconnect all connections are distributed evenly.
Server and client version
nats-server: v2.10.14 nats.java client: 2.9.1
Host environment
No response
Steps to reproduce
ignoreDiscoveredServers=true
Next, simulating nats server upgradeforceReconnect
on all opened connectionsHere's the test case (
testRebalanceWithForceReconnect
) that demonstrates the issue.