Open ruigyang-wish opened 8 months ago
@andrross Thanks for the triage. I also did further investigation for this issue, it seems there is no elegant solution to identify the passed host is load balancer or not in the RestClient. Of course, we can expose another parameter in RestClientBuilder to allow the users to indicate the host is load balancer or not.
@ruigyang-wish Can you pass the same endpoint multiple times so it appears to the client like there are multiple hosts to retry across? Or does it dedupe?
Do other transports in https://github.com/opensearch-project/opensearch-java fix this problem?
@ruigyang-wish Can you pass the same endpoint multiple times so it appears to the client like there are multiple hosts to retry across? Or does it dedupe?
@andrross I also considered this way, but openSearch is using map to maintain the dead hosts, so it will dedupe. https://github.com/opensearch-project/OpenSearch/blob/main/client/rest/src/main/java/org/opensearch/client/RestClient.java#L139
private final ConcurrentMap<HttpHost, DeadHostState> denylist = new ConcurrentHashMap<>();
Is your feature request related to a problem? Please describe
At present, we are using the AWS managed openSearch, and we use the AWS endpoint, such as
https://vpc-57v4bbnpjsz6gmcmhoi2ca.us-west-1.es.amazonaws.com/
, as the openSearch host. Actually, there are several nodes behind the endpoint.Sometimes, the cluster maybe very busy and one of openSearch server node returns
502 Bad Gateway
, then the job crashed, below is the call stack, and we didn't observe significant cpu/mem usage issues at that time.According to the code of openSearch, the openSearch client will mark the
host
as dead if the openSearch server returns502 Bad Gateway
, then try to forward the request to other availablehosts
. So if we only pass the load balancer URL of our openSearch cluster, which means the program will crash and won't retry.https://github.com/opensearch-project/OpenSearch/blob/main/client/rest/src/main/java/org/opensearch/client/RestClient.java#L386-L393
Describe the solution you'd like
So I suggest openSearch client should expose 2 parameters for each openSearch host.
host
is one load balancer or not;host
is the URL of one load balancer, then the second parameter can be used to indicate how many times we can retry before marking it asdead
.Related component
Clients
Describe alternatives you've considered
Alternatively, I suggest to expose one parameter to allow the openSearch client user can set the maximum retry times before marking one host as
dead
.Additional context
NA