[RFC] Strengthen System Index Protection in the Plugin Ecosystem

cwperks commented 5 months ago

Summary

Today, when a plugin stashes the ThreadContext the plugin is in a "trusted" mode where it can perform any action on a cluster (analogous to sudo). Conventionally, stashing the ThreadContext is used when handling API requests in order for a plugin to read or write to its system indices. The plugin ecosystem relies on good intentions to ensure that plugins do not act rogue and perform malicious actions that threaten the integrity of the system (such as deleting another plugin's system index). There should be a better mechanism for controlling what plugins can do after stashing the ThreadContext, perhaps even limiting the actions that a plugin can perform to only index operations with system indices owned by the plugin.

Details

The goal of this project is to restrict what plugins can do inside of a block where the ThreadContext has been stashed. The proposal is to limit actions a plugin can perform in this block to only operations related to its own system indices. Currently, plugins act as root and can operate unrestricted.

pseudo-code of a plugin’s API Handler:

public class IndexResourceTransportAction extends HandledTransportAction<IndexResourceRequest, IndexResourceResponse> {
    private static final Logger LOG = LogManager.getLogger(IndexResourceTransportAction.class);
    private final Client client;
    private final TransportService transportService;
    private final Settings settings;

    @Inject
    public IndexAnomalyDetectorTransportAction(
        TransportService transportService,
        ActionFilters actionFilters,
        Client client,
        Settings settings
    ) {
        super(IndexResourceAction.NAME, transportService, actionFilters, IndexResourceRequest::new);
        this.client = client;
        this.transportService = transportService;
        this.settings = settings;
    }

    @Override
    protected void doExecute(Task task, IndexResourceRequest request, ActionListener<IndexResourceResponse> actionListener) {
        // authenticated user context

        // Without stashing the thread context, transport actions executed here are within the currently authenticated user context
        // Plugins sometimes read user info from the ThreadContext before stashing. 
        // This is used for API Handlers that create scheduled jobs
        // The plugins persist the user read in from the ThreadContext in their own system index
        // User user = getUserContext(client);
        String resourceName = request.getResourceName();
        try (ThreadContext.StoredContext context = client.threadPool().getThreadContext().stashContext()) {
            // root context

            // Operations on system index here
            // Index resource into plugin's system index
        } catch (Exception e) {
            LOG.error(e);
            listener.onFailure(e);
        }
    }
}

This project proposes to bring plugin awareness to the ThreadContext so that whenever the ThreadContext is stashed, a header must be populated that contains the current plugin’s name. The security plugin can then leverage this header during privilege evaluation to determine if the current plugin is the owner of the (system) indices being acted upon. If not, the security plugin will block the request. No other transport actions will be allowed, only index operations on indices belonging to the plugin.

To accomplish this, the security plugin will also need to become aware of the installed plugins and their system indices. Currently, the security plugin maintains a list of System Indices and requires plugin teams submit Pull Requests to the security repo when adding a new System Index in order for the security plugin to properly protect the system index. The security plugin gives System Indices special protections which means that they cannot be created or deleted by regular users (including admin).

Additional Considerations:

The job-scheduler plugin needs to be able to read other plugins’ system indices to get schedule information for scheduled jobs

Implementation Details

To implement a solution will require a couple of components:

Introduction of a concept of an ExecutionContext which is associated with the ThreadContext and accepts a single value intended to hold a reference to the plugin that takes over execution. This value could either contain the plugin unique name or the full ClassName to the plugin’s entrypoint. See https://github.com/cwperks/OpenSearch/pull/172/files for proof-of-concept
Every single instance of core delegating to a plugin should be wrapped with a call to populate the execution context. When control is returned back to core from the plugin, the context should be cleared. If a plugin tries to override the context, it should fail similar to how overriding an existing threadcontext header fails.
1. A potential solution to this problem could be the “proxy” pattern: https://stackoverflow.com/a/37702319
  1. Example implementation that shows how the ActionPlugin interface can be proxied: https://github.com/cwperks/OpenSearch/pull/173
  2. Another example of “proxy” pattern, but in this case its the RestHandlers being proxied: https://github.com/cwperks/OpenSearch/pull/172
    1. The RestHandlers need to be proxied separately, because once registered the extension points in ActionPlugin.java are no longer called and instead these handlers are called directly from the RestController here.
2. Ideally, every instance of core delegating to a plugin should get wrapped, however, API handling is the most important to target for this change initially. Since the Security plugin already wraps every RestHandler, it may be possible to isolate changes to the Security plugin to populate the execution context before delegating to the original handler and clearing upon return. (See https://github.com/opensearch-project/security/blob/main/src/main/java/org/opensearch/security/filter/SecurityRestFilter.java#L190)
  1. One of the problems with this approach is that the security plugin is not aware of which plugin has registered the RestHandler. To solve this, the RestHandler may need to be modified to associate a plugin identity with the RestHandler
The 3rd implementation detail is for the Security plugin to implement a SystemIndexAccessEvaluator which requires obtaining a map of plugin’s and their system indices ({pluginId} → Set{Index Patterns})
1. Formally, plugins that register system indices should be implementing the SystemIndexPlugin extension point. See Security’s implementation for an example: https://github.com/opensearch-project/security/blob/main/src/main/java/org/opensearch/security/OpenSearchSecurityPlugin.java#L2030-L2038
2. The systemIndexDescriptorMap is assembled in core here. This map should be shared with the security plugin, but not be shared with other plugins. To do this, a security specific extension point can be added. See the implementation of ActionPlugin.getRestHandlerWrapper for an instance of a security specific endpoint. A node fails to boot if 2 different plugins try to implement the same extension point.
After obtaining the map of system indices, the security plugin can implement a new access evaluator similar to SecurityIndexAccessEvaluator. This evaluator will read from the thread context to see if 1) User info is not populated in the thread context (meaning thread context is stashed) and 2) read the execution context to see which plugin is currently executing code. If the action being executed is an operation on a system index owned by the plugin then its permitted, otherwise rejected.
The logic in the SecurityFilter needs to change not to automatically skip privilege evaluation in blocks where the thread context is stashed. If the execution context is populated and no user info exists in the ThreadContext it should block all transport actions except for operations on a system index owned by the plugin initiating the request.
1. The only exception to this rule is allowing job-scheduler to read from other plugins’ job indices. Job-scheduler actively listens to index operations on other plugins system indices that store information about scheduled jobs. For example, if the information about the scheduling of a job changes (i.e. interval every 10 min → interval every hour), job-scheduler listens to changes in documents in other plugins’ system indices to react to the changes.
The security plugin can stop manually maintaining a list of system indices: https://github.com/opensearch-project/security/blob/main/src/main/java/org/opensearch/security/tools/democonfig/SecuritySettingsConfigurer.java#L49-L80

Note: This change could be breaking if there are plugins that do other transport actions in a block where the ThreadContext is stashed. This change is intentional to ensure that only index operations on system indices owned by the plugin that’s currently stashed the ThreadContext are permitted. If there is a necessity to run other actions within a block where the ThreadContext is stashed, we can consider implementing policy files for a plugin where a cluster admin gives specific permissions to a plugin to be able to perform. For this, a plugin identity would be analogous to a user with a single role.

Goals:

No changes in plugins (other than the security plugin) required. Changes are isolated to core + security plugin and other plugins are unaware of this change that makes the entire plugin ecosystem more secure

Mermaid Diagram (For Reference/Regen)

sequenceDiagram
    participant User
    participant RestController
    participant RestHandler

    User->>RestController: PUT _plugins/_anomalydetection/detector/create
    activate RestController
    RestController->>RestController: Get RestHandler from registry
    RestController->>RestHandler: Delegate to handleRequest
    deactivate RestController
    activate RestHandler
    RestHandler->>RestHandler: Plugin processes RestRequest
    RestHandler->>User: Send API Response

Details

In this diagram, OpenSearch acts as an HTTP server and creates a single thread to handle an API request once a request is received. When handling the request, it goes through the netty pipeline and then eventually into the RestController which is responsible for the actual handling of the request. The RestController has the full registry of RestActions (native + modules + plugins) and based on the requested API Path + Verb, it determines which RestHandler will handle the incoming request. If the RestHandler is from an API that was registered by a plugin, the RestController will call the handleRequest method of the registered RestHandler and the execution is delegated to a plugin at that point in time. Currently, there is nothing tracking where the point of execution is (i.e. the thread is executing code from the core or code from a plugin). It would be useful for the security plugin to have the execution context to differentiate whether a plugin is handling the API request or not.

cwperks commented 5 months ago

In the default distribution of OpenSearch, these are the plugins that have system indices registered via the SystemIndexPlugin.getSystemIndexDescriptors() extension point:

{
    "org.opensearch.jobscheduler.JobSchedulerPlugin":
    [
        ".opendistro-job-scheduler-lock"
    ],
    "org.opensearch.notifications.NotificationPlugin":
    [
        ".opensearch-notifications-config"
    ],
    "org.opensearch.replication.ReplicationPlugin":
    [
        ".replication-metadata-store"
    ],
    "org.opensearch.indexmanagement.IndexManagementPlugin":
    [
        ".opensearch-control-center",
        ".opendistro-ism-config"
    ],
    "org.opensearch.securityanalytics.SecurityAnalyticsPlugin":
    [
        ".opensearch-sap-threat-intel"
    ],
    "org.opensearch.search.asynchronous.plugin.AsynchronousSearchPlugin":
    [
        ".opendistro-asynchronous-search-response"
    ],
    "org.opensearch.knn.plugin.KNNPlugin":
    [
        ".opensearch-knn-models"
    ],
    "org.opensearch.security.OpenSearchSecurityPlugin":
    [
        ".opendistro_security"
    ],
    "org.opensearch.geospatial.plugin.GeospatialPlugin":
    [
        ".geospatial-ip2geo-data"
    ],
    "org.opensearch.dashboards.OpenSearchDashboardsPlugin":
    [
        ".apm-custom-link",
        ".opensearch_dashboards",
        ".apm-agent-configuration",
        ".opensearch_dashboards_*",
        ".reporting-*"
    ]
}

This is the list that security tracks: https://github.com/opensearch-project/security/blob/main/src/main/java/org/opensearch/security/tools/democonfig/SecuritySettingsConfigurer.java#L49-L80

".plugins-ml-agent",
".plugins-ml-config",
".plugins-ml-connector",
".plugins-ml-controller",
".plugins-ml-model-group",
".plugins-ml-model",
".plugins-ml-task",
".plugins-ml-conversation-meta",
".plugins-ml-conversation-interactions",
".plugins-ml-memory-meta",
".plugins-ml-memory-message",
".plugins-ml-stop-words",
".opendistro-alerting-config",
".opendistro-alerting-alert*",
".opendistro-anomaly-results*",
".opendistro-anomaly-detector*",
".opendistro-anomaly-checkpoints",
".opendistro-anomaly-detection-state",
".opendistro-reports-*",
".opensearch-notifications-*",
".opensearch-notebooks",
".opensearch-observability",
".ql-datasources",
".opendistro-asynchronous-search-response*",
".replication-metadata-store",
".opensearch-knn-models",
".geospatial-ip2geo-data*",
".plugins-flow-framework-config",
".plugins-flow-framework-templates",
".plugins-flow-framework-state"

In order for plugins to get protection from the mechanism proposed in this RFC, they will need to implement SystemIndexPlugin.getSystemIndexDescriptors

stephen-crawford commented 5 months ago

[Triage] Hi @cwperks, really nice job putting this together. This seems like a great idea to me, and would love to hear more input from others. Going to mark as triaged.

andrross commented 5 months ago

The proposal is to limit actions a plugin can perform in this block to only operations related to its own system indices.

My first thought is that this is something that the core should be enforcing directly. If "a given plugin never needs to read or write another plugin's system index" is always true, then it sounds like this should be something enforced by the plugin framework and not functionality added by the security plugin.

cwperks commented 5 months ago

Thank you for reviewing @andrross! I opened up a PR in core hoping to spark more discussion. In that PR I decided to implement an extension point to pass the system indices to the security plugin because the security plugin also has a notion of a "superuser" (think of it like the root user) when connecting with the admin certificate. This allows cluster operators to delete system indices if a cluster is in a corrupt state.

If system index protection is provided directly by core, I think a method of last resort would still be needed in case a cluster operator has to take invasive action.

One idea would be to bring the concept of "superuser" into the core and skip authorization for a user authenticated as the superuser.

kumargu commented 5 months ago

The proposal is to limit actions a plugin can perform in this block to only operations related to its own system indices

Is this not restricted by Java security manager?

cwperks commented 5 months ago

Is this not restricted by Java security manager?

No, JSM does not restrict transport actions that plugins can perform. Some examples of what JSM lets a plugin do are read or write certain properties (Example), read and write to the file system (Example) or various other actions like using reflection

There is nothing restricting what transport actions a plugin can perform in a block where the threadcontext has been stashed. With extensions, I wrote a proposal about "policies" that a cluster admin can assign to an extension that governs what actions it can perform outside of an authenticated user context (ref). The proposal in this RFC is to limit actions in a block where the threadcontext has been stashed to only interactions with a plugin's system indices. If a plugin needs to perform additional transport actions outside of an authenticated user context, then we would also need to develop a mechanism for cluster admins to define what actions the plugin is allowed to perform.

nibix commented 2 months ago

One thing which has just crossed my mind:

What about runtime dependencies available to plugins - both via the createComponents() call and via @Inject. Are there also plans to restrict the availability of certain dependencies? Otherwise, plugins might be just able to re-implement the necessary transport actions to access indices.

cwperks commented 2 months ago

@nibix See the discussion here.

Initially, I wanted to change the visibility of methods within the ThreadContext class, but took a different approach to sealing the ThreadContext class using JSM permissions. Example PR which sealed markAsSystemContext.

To seal the ThreadContext class (in 3.0.0) the plan is to put ThreadContext.stashContext() behind a JSM permission. That way plugins can still request permission to perform stashContext in the plugin-security.policy file, but the cluster admin is prompted about it ahead of time and can choose to proceed with installation or not.

Plugins included in the default distribution of OpenSearch will be encouraged to adopt the new mechanism for system index access: https://github.com/opensearch-project/opensearch-plugins/issues/238

I would like to get to a state where no plugin in the default distribution of OpenSearch needs to request permission to perform threadContext.stashContext, except for the Security plugin because the security plugin needs to be able to stash the context and insert the plugin user into the ThreadContext as proposed in https://github.com/opensearch-project/security/pull/4665

nibix commented 2 months ago

@cwperks I did not mean ThreadContext but other dependencies - for example IndicesService is available to plugins via @Inject, which again exposes IndexService, which directly gives methods to manipulate index meta data, without triggering any transport action:

https://github.com/opensearch-project/OpenSearch/blob/4963792c175ab3eb30a493e81360dce95b5b2811/server/src/main/java/org/opensearch/index/IndexService.java#L1068

cwperks commented 2 months ago

I see, we definitely need to review what is available to plugins and gate anything that could be considered a privileged operation.

andrross commented 2 months ago

To seal the ThreadContext class (in 3.0.0) the plan is to put ThreadContext.stashContext() behind a JSM permission

@cwperks The Java Security Manager is deprecated and planned to be disabled or removed in future versions of the Java runtime, right? I don't think we should make long term plans that depend on JSM functionality.

cwperks commented 2 months ago

@andrross In this comment @reta mentions the long-term solution would be to prohibit the usage of ThreadContext outside of the core using JPMS.

The usage of the ThreadContext outside of the core should not be possible. The long term solution for that is using JPMS

The changes being introduced in https://github.com/opensearch-project/OpenSearch/pull/14630 and https://github.com/opensearch-project/OpenSearch/pull/15778 will allow plugins to remove many direct usages of ThreadContext.stashContext. There are still other security use-cases where plugins access the ThreadContext directly, namely 1) scheduled job security using Roles Injection and 2) Plugin Resource access control that still need to be addressed to remove further instances of thread context access within plugins. Along with moving security into core as well.

So long-term solution is JPMS to prevent usage of the ThreadContext within plugins. We also need to figure out what the replacement for JSM is for OpenSearch, I've been watching the thread closely.

cwperks commented 1 month ago

The merge of https://github.com/opensearch-project/OpenSearch/pull/16024 does not fully resolve this issue. Re-opening.

opensearch-project / security

[RFC] Strengthen System Index Protection in the Plugin Ecosystem #4439