opensearch-project / opensearch-sdk-java

OpenSearch SDK to build and run extensions
Apache License 2.0
28 stars 58 forks source link

[RFC] Allow Extension SDK to be stateless to support Serverless (FaaS) extensions #746

Open dbwiddis opened 1 year ago

dbwiddis commented 1 year ago

Is your feature request related to a problem?

The current design of extensions requires the Extension's server to maintain a connection with OpenSearch, either on a remote node or on the same node(s) as OpenSearch.

However, many use cases for extensions involve stateless processing of data, such as language analyzers which normalize and return a TokenStream. If not processed within OpenSearch, they ideally would make use of serverless/FaaS capabilities such as AWS Lambda, Azure Functions, Google Cloud Functions, OCI Functions, OpenFaaS, etc.

Extensions should support stateless connections; in fact, they are already almost there, and the API could be adjusted to make maintaining state optional. This would greatly benefit the flexibility/choice available to users.

What solution would you like?

  1. Design a handler mechanism for Extensions that parallels the existing Transport handlers.
    • Conceptually these are very similar approaches: a "named" request that goes to a particular Transport Request/Response handler, and a byte stream serializing the request and response.
    • Extensions should, for example, permit both a Transport request named internal:discovery/extensions and a REST endpoint /_extensions/_internal/discovery/extensions. Both Transport and REST handlers would feed into the same code path.
  2. Design a mechanism by which an extension can communicate required state for API requests
    • For example, a request to create an index may require the value of a timeout Setting. Presently, Extensions maintain all their own settings and subscribe to changes on OpenSearch to keep them up to date with any user changes. In a stateless environment, the API would know it would need to communicate (possibly as a REST param or header or somewhere in the content) the current value of this setting.
    • Extensions can always query OpenSearch via REST API to obtain additional values; however, proactively sending the required data will be more efficient
    • Extensions will generally know the original/default value of many of these settings, so this is likely some sort of "diff state" maintained by the ExtensionsManager
  3. Introduce statistics associated with serverless function executions, giving users visibility into performance and permitting them to optimize costs.

What alternatives have you considered?

Since the "diff state" mentioned above needs to be maintained externally to a serverless function, a "thin client" in between OpenSearch and serverless extensions is needed. We could have a small extension which maintains this state, or provide this support as a native plugin rather than in core. Alternately, if there is a move to make the ExtensionsManager itself a plugin/module, this capability could be integrated there.

We could consider some other storage resource accessible to the serverless functions that could maintain the state. This might be a good idea to include anyway as a future feature.

Do you have any additional context?

We should consider leveraging other frameworks which may have solved many pieces of this puzzle. For example, Apache EventMesh

We can also consider other communications methods beyond (Netty) Transport and HTTP, such as gRPC.

dbwiddis commented 1 year ago

Inevitable question: "Should this be posted on OpenSearch repo to get more eyes on it?"

Answer: Yes, I plan to post a more detailed proposal there eventually. This is an initial "do you think this is a good idea" and "do you have suggestions for things I can look into" before creating a more detailed RFC there, so a slightly narrower focus is intended.

dblock commented 1 year ago
  1. Is there a category of fast lived extensions that can be satisfied with a version in which all required state is sent as an opening request, and all updates are returned as a response? An extension that calls an API exposed in the SDK would either 1) get the data it expects as part of startup and return immediately, 2) make a remote call to the host, 3) fail. This can even be dynamically warmed up by combining 1 + 2.
  2. I like the idea of leveraging an existing framework to exchange state behind the scenes.
  3. Eventually, to gain any semblance of acceptable latency I think this communication cannot be chatty, and needs to look like fast request/response.
dbwiddis commented 1 year ago
  1. Is there a category of fast lived extensions that can be satisfied with a version in which all required state is sent as an opening request, and all updates are returned as a response?

I think the Language Analyzers and other sorts of "processing" types of extensions fit this nicely. Basically the more narrow the extension's focus is, the more likely it is to work like this.

dblock commented 1 year ago

@dbwiddis What's a typical latency (time spent in) a language analyzer? If it's "long enough", I am going to claim that the overall throughput can be improved by remoting all processing to an actual Lambda that is "infinitely" scalable (trading networking latency for more CPU).