opensearch-project / opensearch-java

Java Client for OpenSearch
Apache License 2.0
109 stars 171 forks source link

[FEATURE] Fault tolerance #958

Open gweyeratlassian opened 2 months ago

gweyeratlassian commented 2 months ago

Is your feature request related to a problem?

There doesn't seem to be a way to configure retries in the Java client. This feature is present in the .NET client. The AWS SDK for Java 2.x retries requests as well and this behavior can be configured. I understand the situation is more complicated for OpenSearch as the SDK needs to support both Amazon OpenSearch and a self-hosted deployment.

I'm unsure of the best way to handle faults. I initially attempted to retry at the transport level, but both the Apache HttpClient 5 transport (here) and the AWS SDK 2 transport (here) disable retries. While it's possible to enable retries for the Apache HttpClient 5 transport, enabling retries for the AWS SDK 2 transport requires to muck with the internals of the library. This would also lead to two distinct implementations.

What solution would you like?

It would be great if the team could document the SDK's failure modes and provide guidance around fault handling. Different exceptions seem to be used for similar purposes. I'm unsure what the difference is between org.opensearch.client.transport.httpclient5.ResponseException and org.opensearch.client.opensearch._types.OpenSearchException. OpenSearchException seems to be more widely used, while ResponseException seems to be specific to the Apache HttpClient 5 transport. If I'm using the Apache HttpClient 5 transport, do I need to handle both?

What alternatives have you considered?

I've decorated specific operations with Resilience4j retries, but my configuration is based on experimentation and is likely to be incomplete.

Do you have any additional context?

Not really.

dblock commented 2 months ago

This is a big area we'd like help in, with https://github.com/opensearch-project/opensearch-clients/issues/27 being the umbrella for it.