[Meta] Plugin sandboxing: Step towards modular architecture in OpenSearch

saratvemulapalli commented 3 years ago

Problem

Plugin architecture enables extending core features of OpenSearch. There are various kinds of plugins which are supported. But, the architecture has significant problems for OpenSearch customers. Mainly, plugins can fatally impact the cluster e.g critical workloads like ingestion/search traffic would be impacted because of a non-critical plugin like s3-repository failed with an exception. The problem multiplies exponentially when we would like to run an arbitrary plugin as OpenSearch core and system resources are not protected well enough.

Zooming in technically, Plugins run with-in the same process as OpenSearch. As OpenSearch process is bootstrapping, it initializes PluginService.java via Node.java. All plugins are classloaded via loadPlugin during the bootstrap of PluginService. It looks for plugins directory and loads the classpath where all the plugin jar and its dependencies are already present. During the bootstrap, each plugin is initialized and they do have various interfaces through which they could choose to subscribe to state changes within the cluster e.g ClusterService.java.

Resources on the system for Plugins in OpenSearch are managed via Java Security Manager. It is initialized during the bootstrap of OpenSearch process. Each plugin defines a security.policy file e.g Anomaly Detection Plugin

As we can see, plugins are loaded into OpenSearch process which fundamentally needs to change.

Objective

This feature enables any plugin to run safely without impacting the cluster and the system.

Design

PLEASE NOTE: THIS DOCUMENT IS WORK IN PROGRESS AND DOES NOT REPRESENT THE FINAL DESIGN.

Plugins Sandboxing_today

Requirements

TBD (Define what we would like to accomplish and whats not changing in the system).

The high level thoughts for plugin sandboxing is basically trying to isolate plugin interactions with OpenSearch. All the interactions for plugins are via extension points. If we can modularize these extension points, I believe we can achieve isolation for plugins.

Proposal

Plugin Sandboxing New World

Plugins run with the OpenSearch process today. We are proposing running plugins through (thanks to dblock@):

OpenSearch process
Independent process
Remote node

We see value in offering an option to run the plugin in different parts of the system. Some plugins would like run within the process (like searching, indexing), in an independent process (like snapshot repository) and on a remote node (like machine learning).

We will build a new Plugins Orchestrator which will facilitate running plugins in all 3 ways. New interfaces will be defined to establish communication between extension and OpenSearch.

Proof of Concept

To explore this idea more, we would like to have a plugin running in an independent process.

Tracking Issues

Learn and Share:

[x] Blog Post: How do plugins work: https://github.com/opensearch-project/project-website/issues/446
[x] Identify existing extension points for plugin interfaces: https://github.com/opensearch-project/OpenSearch/issues/1573
[x] Java Security Manager: What resources JSM protect for plugins, what could be potential alternatives if we decide to deprecate JSM.

Milestones:

[x] Design (Proof of Concept): Build a communication mechanism to enable communication between an extension and OpenSearch. https://github.com/opensearch-project/OpenSearch/issues/2019 https://github.com/opensearch-project/OpenSearch/issues/2351
[x] OpenSearch SDK: Build a new initial SDK which heavy lifts the work to develop an extension. https://github.com/opensearch-project/OpenSearch/issues/1619
[x] Extension Framework (Proof of Concept): Add extension interface support for first extension point (IndicesModule). https://github.com/opensearch-project/OpenSearch/issues/2711 https://github.com/opensearch-project/OpenSearch/issues/2691
[x] Proof of Concept: Build a light weight extension integrating OpenSearch SDK and the first extension point (IndicesModule)
[x] Benchmarking Latency: Benchmark the light weight extension running as plugin vs running as extension (independent process outside of OpenSearch). https://github.com/opensearch-project/OpenSearch/issues/2231
[x] Run AD plugin as an Extension to create a detector. https://github.com/opensearch-project/OpenSearch/issues/3635
[x] Get detector for AD Extension - https://github.com/opensearch-project/opensearch-sdk-java/issues/211
[x] Validate detector for AD Extension - https://github.com/opensearch-project/opensearch-sdk-java/issues/217
[ ] Migration: Add extension interface support for all default extension points. https://github.com/opensearch-project/OpenSearch/issues/3136
[ ] Migration: Add extension interface support for custom extension points.
[ ] Security support for extensions https://github.com/opensearch-project/security/issues/1895

Meta: https://github.com/opensearch-project/OpenSearch/issues/1632

Back Burner:

[ ] Migration: Add extension interface support to new extension points. Migrate one plugin from opensearch-project and run it as an extension. https://github.com/opensearch-project/OpenSearch/issues/3011 https://github.com/opensearch-project/OpenSearch/issues/2981

FAQ

How would an extension communicate with OpenSearch

We are exploring to use a light weight form of Transport which will help bi-directional communication. Transport is the communication mechanism OpenSearch uses between nodes.

How is the latency of an extension compared to a plugin loaded in memory?

AD Extension with create detector functionality latency: https://github.com/opensearch-project/opensearch-sdk-java/issues/24#issuecomment-1309547639 AD plugin latency: https://github.com/opensearch-project/opensearch-sdk-java/issues/24#issuecomment-1309588329 With an example extension point onIndicesModule(), we see about 8-11% depending on workload and the throughput decrease is between .05%-7%.

https://github.com/opensearch-project/OpenSearch/issues/2231

3012

Would the extensions framework offer both methods for extensions i.e. same process and another process as alternatives

Now as we have the numbers for latency, we see there is value running plugins in process and we will continue to support it for critical workloads in the cycle of querying, indexing.

When will Extension framework and OpenSearch SDK be released?

We are working towards OpenSearch 3.0 to have the initial framework to support extensions and release the first version of SDK support default extension points.

Have you tried migrating a plugin into an extension?

We are working on anomaly detector backend plugin as prototype and run it as an extension. https://github.com/opensearch-project/OpenSearch/issues/5224

pjfitzgibbons commented 2 years ago

Could you tell us - what is the plan for "updating" a plugin? Assuming plugin-X is not fully complete on first release, how can users of the plugin update that plugin on their own instance of Core and Dashboard?

dblock commented 2 years ago

@pjfitzgibbons Extensions will work like in VSCode or any other sane system, where they will declare a minimum (and sometimes a max) version of OpenSearch required. Then you'll be able to upgrade them to a newer release assuming it's compatible with your current version of OpenSearch at runtime, without restarting a cluster. Does this answer your question?

pjfitzgibbons commented 2 years ago

@dblock Yes, understood. Is there a specific task above that you believe implicitly includes upgrading functionality (or version detection or ... ?)

owaiskazi19 commented 2 years ago

@dblock Yes, understood. Is there a specific task above that you believe implicitly includes upgrading functionality (or version detection or ... ?)

Hey @pjfitzgibbons! You can find more details on API Versioning here: https://github.com/opensearch-project/OpenSearch/issues/2447

opensearch-project / OpenSearch