pingidentity / ldapsdk

UnboundID LDAP SDK for Java
Other
327 stars 79 forks source link

RoundRobinServerSet bias when some servers unavailable #118

Open tvernum opened 2 years ago

tvernum commented 2 years ago

I noticed this issue while upgrading from 4.0.8 to 6.0.2, and it looks like it's a consequence of 2270056aa8d2a24c195606d55c955e38f4d50870

The way RoundRobinServerSet picks a server is balanced when all servers are available, but can end up with significant bias when a server fails and is added to the blacklist.

Supposed we have a server set with entries:

4 servers, split across 2 availability zones.

When all 4 are available, RoundRobinServerSet will loop through them evenly and distribute the load.

But if the network connection to Zone 1 drops, then the first 2 servers become unavailable, and the response from getConnection will bias (heavily) to the first entry after the failed severs (in our example ldap03).

This is because the nextSlot field doesn't care about availability, and the process by which RoundRobinServerSet looks for an alternative server always selects the next available server. Consequently, in our example:

The result is that 75% of connections use ldap03 and 25% use ldap04

This isn't simple to fix, given the desire for high concurrency (per 2270056aa8d2a24c195606d55c955e38f4d50870) but one option would be to implement the following algorithm:

That should succeed in skipping a run of failed servers without a significant impact on concurrency.

I'd be happy to pull together a patch, but I take it you don't want that.

dirmgr commented 2 years ago

Thanks for reporting this. I've just committed a change that should address it.