scylladb / alternator-load-balancing

Various tricks, scripts, and libraries, for load balancing multiple Alternator nodes
Apache License 2.0
18 stars 11 forks source link

Do not include hosts that are not available/ready #21

Closed wpaven closed 3 months ago

wpaven commented 4 months ago

Using the java client lib with a cluster that is being scaled out, the client lib will see nodes that are joining, but not available. Errors will be thrown upon trying to connect to the joining node(s) : Caused by: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Connect to 10.151.0.68:8000 [/10.151.0.68] failed: Connection refused This can cause applications to not be able to recover from errors. Can the client lib only include hosts that are available in the host list for connections?

mykaul commented 4 months ago

What's the impact of those errors? Who's seeing them? Isn't it a temporary issue?

nyh commented 4 months ago

In theory, this can be fixed in the load balancing library - it can double-check every node it gets to see if it's really responsive. But I think it is cleaner to fix it in Scylla - Scylla shouldn't put not-yet-ready nodes in the list in the first place. I opened an issue in the core Scylla - https://github.com/scylladb/scylladb/issues/19694.

nyh commented 4 months ago

I sent a Scylla patch for this in https://github.com/scylladb/scylladb/pull/19725, so I think we should eventually close this load-balancer issue as WONTFIX.

nyh commented 3 months ago

I think this was fixed by the Scylla fix linked above. If you disagree, please reopen this issue.