opensearch-project / ml-commons

ml-commons provides a set of common machine learning algorithms, e.g. k-means, or linear regression, to help developers build ML related features within OpenSearch.
Apache License 2.0
88 stars 125 forks source link

[META] Migrate feature/multi_tenancy branch to main #2650

Open dbwiddis opened 1 month ago

dbwiddis commented 1 month ago

I will be starting the process of migrating the code developed on the feature/multi_tenancy branch to main. I'm creating this issue to establish context for reviewers and pre-answer some frequently asked questions.

Important context to understand:

  1. There are two main purposes of this branch/feature:

    • Abstract out the CRUD/S operations on system indices into a generic metadata storage option, that can use system indices, or a remote OpenSearch cluster, or DynamoDB, or (in the future) other storage options.
    • Add functionality to pass along a Tenant ID which will be used in the remote storage options to allow multiple tenants to use the same (stateless) plugin instance.
  2. There will be new packages created in the opensearch-ml-common module:, org.opensearch.sdk containing a new metadata interface and associated request/response objects, and org.opensearch.sdk.client containing a default client implementation based on NodeClient, which implements existing behavior.

    • The location of this code in ML-Commons is temporary; it will eventually be migrated somewhere else (possibly a module on the core project, possibly a separate plugin, possibly the Java SDK project).
    • The additional clients are presently in the opensearch-ml-plugin module org.opensearch.ml.sdkclient package. Like the above interfaces, this is not their final location.
  3. Because of the way the createComponents() method on the Plugin interface works (binding those instances to their runtime class, not their interface), the client will be a concrete class which delegates method calls to an internal delegate instance. This is not the ideal object-oriented structure, but seemed the best approach to simplify injection of the client into classes which are instantiated inside createComponents() and would otherwise not have access to an interface-bound instance created in a separate module.

  4. The vast majority of changes across the codebase are copy-and-paste using the following general format (Get is shown, similar for Index, Update, Delete, and Search):

Old code (some variations on wrappers or anonymous subclasses, etc.):

GetRequest getRequest = new GetRequest(<system index>).id(<doc id>).fetchSourceContext(fetchSourceContext);
client.get(getRequest, ActionListener.runBefore(ActionListener.wrap(r -> {
    // response code goes here
}, e -> {
    // exception code goes here
}), context::restore));

New code (some variations):

GetDataObjectRequest getDataObjectRequest = GetDataObjectRequest
    .builder()
    .index(<system index>)
    .id(<doc id>)
    .tenantId(tenantId)
    .fetchSourceContext(fetchSourceContext)
    .build();
sdkClient
    .getDataObjectAsync(getDataObjectRequest, client.threadPool().executor(GENERAL_THREAD_POOL))
    .whenComplete((r, throwable) -> {
        context.restore();
        if (throwable == null) {
            try {
                GetResponse gr = r.parser() == null ? null : GetResponse.fromXContent(r.parser());
                //
                // exact copy of response code from above
                //
            } catch (Exception e) {
                listener.onFailure(e);
            }
        } else {
            Exception e = SdkClientUtils.unwrapAndConvertToException(throwable);
            //
            // exact copy of exception code from above
            //
        }
    });
  1. Many unit tests needed to be adjusted to adapt to the asynchronous nature of the new client interface. Tests cases are intended to preserve exact current behavior, but some small changes are needed to adjust to the client implementation using a Future rather than an ActionListener, and the multi-threaded execution.

Current test:

public void testGet_Success() throws IOException {
    GetRequest getRequest = < instantiate test request > ;
    GetResponse getResponse = < instantiate test response > ;
    doAnswer(invocation -> {
        ActionListener<GetResponse> listener = invocation.getArgument(1);
        listener.onResponse(getResponse);
        return null;
    }).when(client).get(any(), any());

    getTransportAction.doExecute(null, getRequest, actionListener);
    verify(actionListener).onResponse(any(GetResponse.class));
}

Updated test to account for multithreading and client args

public void testGet_Success() throws IOException {
    GetRequest getRequest = < instantiate test request > ;
    GetResponse getResponse = < instantiate test response > ;

    // Equivalent of doAnswer().when()
    PlainActionFuture<GetResponse> future = PlainActionFuture.newFuture();
    future.onResponse(getResponse);
    // Note change to single-arg client.get which returns a PlainActionFuture
    when(client.get(any(GetRequest.class))).thenReturn(future);

    // The original execution while async in reality used a single-threaded doAnswer().when() mock.
    // The new execution is on another thread and can cause a race condition, so we await the completion
    // of the action listener with a latch. 
    CountDownLatch latch = new CountDownLatch(1);
    LatchedActionListener<GetResponse> latchedActionListener = new LatchedActionListener<>(actionListener, latch);
    // Same call, but using the latched listener
    getTransportAction.doExecute(null, getRequest, latchedActionListener);
    // Usually completes in a few milliseconds
    latch.await(500, TimeUnit.MILLISECONDS);

    // Remainder of test code as it previously was
    verify(actionListener).onResponse(any(GetResponse.class));
}

Expect PRs and Tasks to accomplish the following:

Add any questions/comments!

dhrubo-os commented 1 month ago

The location of this code in ML-Commons is temporary; it will eventually be migrated somewhere else (possibly a module on the core project, possibly a separate plugin, possibly the Java SDK project).

Do we have any ETA on getting the final decision on this?

Expect PRs and Tasks to accomplish the following:

Do we have any order to follow? Like may be performing sanity testing first on the branch? Also integration tests are missing here?

dbwiddis commented 1 month ago

The location of this code in ML-Commons is temporary; it will eventually be migrated somewhere else (possibly a module on the core project, possibly a separate plugin, possibly the Java SDK project).

Do we have any ETA on getting the final decision on this?

No. I've mentioned the need to decide on this "sooner rather than later" but it hasn't seemed to prompt action.

Is there a particular deadline you think is needed that we can push toward? Should we delay migrating from the feature branch until it's decided? @andrross do you have any input here?

Do we have any order to follow?

My primary goal was articulating the tasks first.

The default implementations can probably follow the sequence outlined; most testing already exists for those, and I expect we might only find a few misses.

The alternate data store implementations can / should proceed in parallel and may require a lot of new tests to be written.

Like may be performing sanity testing first on the branch?

I'd appreciate your input here as a maintainer on what you require to merge this.

It sounds like we should keep developing on the branch until we're satisfied? I thought there was a push to get this on main.

Also integration tests are missing here?

I've listed integ tests 3 times above. Is there a specific case I'm missing? To elaborate:

arjunkumargiri commented 1 month ago

+1 on reusing existing integ tests to test against remote opensearch and DDB stores.

From an ordering perspective, we should add new integ tests in feature branch before merging them to main.

dhrubo-os commented 1 month ago

From an ordering perspective, we should add new integ tests in feature branch before merging them to main.

+1

andrross commented 1 month ago

The location of this code in ML-Commons is temporary; it will eventually be migrated somewhere else (possibly a module on the core project, possibly a separate plugin, possibly the Java SDK project).

Do we have any ETA on getting the final decision on this?

No. I've mentioned the need to decide on this "sooner rather than later" but it hasn't seemed to prompt action.

Is there a particular deadline you think is needed that we can push toward? Should we delay migrating from the feature branch until it's decided? @andrross do you have any input here?

My thinking here is that there is a potential project to introduce a proper metadata abstraction into the core (for use by the core itself and plugins) as described here. The interfaces you've defined in ml-commons may well be part of that project, but I think we'd need at least a design for this work before we'd know that. Given that plugins want to move forward with this work ahead of doing the core work, I'd lean towards putting these interfaces in a library outside of the core in the short term since the core itself will not be using them.