Closed wpaven closed 3 months ago
What's the impact of those errors? Who's seeing them? Isn't it a temporary issue?
In theory, this can be fixed in the load balancing library - it can double-check every node it gets to see if it's really responsive. But I think it is cleaner to fix it in Scylla - Scylla shouldn't put not-yet-ready nodes in the list in the first place. I opened an issue in the core Scylla - https://github.com/scylladb/scylladb/issues/19694.
I sent a Scylla patch for this in https://github.com/scylladb/scylladb/pull/19725, so I think we should eventually close this load-balancer issue as WONTFIX.
I think this was fixed by the Scylla fix linked above. If you disagree, please reopen this issue.
Using the java client lib with a cluster that is being scaled out, the client lib will see nodes that are joining, but not available. Errors will be thrown upon trying to connect to the joining node(s) :
Caused by: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Connect to 10.151.0.68:8000 [/10.151.0.68] failed: Connection refused
This can cause applications to not be able to recover from errors. Can the client lib only include hosts that are available in the host list for connections?