opensearch-project / opensearch-sdk-java

OpenSearch SDK to build and run extensions
Apache License 2.0
28 stars 59 forks source link

[BUG] SDKClient implementations are not thread safe and are difficult to close #612

Open dbwiddis opened 1 year ago

dbwiddis commented 1 year ago

What is the bug?

The various clients exposed by SDKClient include this warning in the initialize() methods:

The user is responsible for calling {@link #doCloseJavaClients()} when finished with the client

This is possible with the SDKRestClient which delegates its close() method to the internal RestHighLevelClient and implements Closeable. Users can simply use a try-with-resources to wrap their usage of the client.

However, a "responsible" user can't really do that safely for the Java Clients (OpenSearchAsyncClient and OpenSearchClient) because:

How can one reproduce the bug?

Call SDKClient.initializeJavaClient and initializeJavaAsyncClient from separate threads using different host/port options, and watch chaos commence. (Actually I think even connecting with the same host/port will create a different connection and could interrupt in-progress calls. And this may be the source of some flaky results we're seeing.)

What is the expected behavior?

Users have the ability to use try-finally in a Rest Handler implementation:

try (OpenSearchClient client = someMethod()) {
  // do stuff
} finally {
  // some method that safely closes client
}

Alternately, enforce the singleton nature of each client and do not share the same RestClient between multiple clients and provide an easier closing method than the existing call to close all clients.

Do you have any additional context?

We will soon be adding headers to the client RequestOptions (See #430) and these need to be per-call, requiring additional localization of these transport calls and a desire for open/close in the same Rest Handler.

I am not sure of the best approach but I'm certain that what we have is not it.

kokibas commented 1 year ago

Hello @dbwiddis , I suggest using a singleton for each client to avoid sharing the same RestClient between multiple clients. This will allow closing the RestClient only once when all clients using it are finished.

dbwiddis commented 1 year ago

Hello @dbwiddis , I suggest using a singleton for each client to avoid sharing the same RestClient between multiple clients. This will allow closing the RestClient only once when all clients using it are finished.

That's one possible solution, but they aren't implemented as such.

That also complicates things like adding multiple configurations (in theory we should be able to connect to two different OpenSearch clusters and talk to both of them).

Another option is to just create one client per initialize request and make sure it only has its own resources (no shared RestClient), and manage closing it better, such as using finalizers.

saratvemulapalli commented 1 year ago

So ultimately we are working around this in AD work by initializing a single Sync OR Async client at the Extension level and passing it around. This works effectively but we never close it per the docs.

From the problem, do you think OpenSearchAsyncClient OpenSearchClient support multi threading, i.e fire multiple requests at the same time and handle it seamlessly ?

If so, it definitely makes sense to pass it around and initialize once. I would see it as once per SDK and let any number of threads access it. As part of the Extension interface, we can probably expose a new interface tearDown which internally tears down all resources cleanly from SDK which should include closing the clients.

dbwiddis commented 1 year ago

If so, it definitely makes sense to pass it around and initialize once.

After posting this, the discussion on #616 made me wonder if we should be dealing with these clients in the SDK at all. Do we want to make a hard dependency on both sync and async clients in the API?