[FEATURE] Decouple security configuration from user data storage

peternied commented 1 year ago

Is your feature request related to a problem? Initially reported in https://github.com/opensearch-project/security/issues/1842

Unfortunately, the architecture of the security configuration, on update a full recreation of all configuration derived objects is performed. This is an issue because the internal IdP shares the same configuration as the security config.

The implication of this is that if any information about a user is changed, such as a password being modified or a permission added the plugin is reloaded on all nodes.

What solution would you like? The internal identity provider should have its configuration data stored in a separate configuration index from the configuration values that cause the whole plugin to reload itself.

cwperks commented 1 year ago

[Triage] @scrawfor99 Would you mind taking a look?

stephen-crawford commented 1 year ago

Update: I have started looking at resolving this issue and the associated one linked. I am going through the steps to reproduce the problem and then will start looking at the best solution.

Edit: I have reproduced and am now looking at InternalAuthBackend to see about splitting the configuration storage.

Proposed changes 1: For splitting, I am considering adding a usersConfig and usersConfigPath parameters to the DynamicConfigurationModel and then splitting the existing ConfigV7 & ConfigV6 so that there is UserConfigV7 which takes all their authentication-focused configurations. Then I plan to swap the internal authentication listeners so we execute on the UserConfigV7 instead of the basic configuration file. This will still execute multiple requests each time there is a configuration update but you won't reload the entire configuration on each. @peternied is this the type of resolution you had in mind?

Edit 2: Linking some design work for a potential correction of how we store configuration settings. The new intention is to go further then the proposed changes 1 and actually construct an improved storage system for settings. Linking here

peternied commented 1 year ago

I like your proposal and I would go even farther, I think we should store the new index ~in a legit metadata index instead of reusing the existing config system~. I think you could just link the other issue where you were documenting how to handle permissions (in abstract the problem is identical)

cwperks commented 1 year ago

...legit metadata index

@peternied What is a metadata index?

peternied commented 1 year ago

Finally did my homework... thanks for bearing with me

How it works today

SystemIndexPlugin is the mechanism to declare index(es) that are used by a plugin. The Security Plugin is using this capability already. The Security Plugin uses a single index, shard'ed onto every node, with N documents, a document is a json version of the yml configuration files in the repository.

ConfigurationRepository handles initialization and retrieval/caching, the schema modeling is handled by ConfigurationLoaderSecurity7. DynamicConfigFactory listens for change updates from ConfigurationRepository and then push them out via a pub/sub event bus.

AbstractApiAction is the base action for security plugin actions, for any put/delete action a saveAnUpdateConfigs(...) call puts an updates document in the index, and then TransportConfigUpdateAction fans out to all nodes so ConfigurationRepository flushes its cache and reloads.

The Security Plugin stores its configuration data on the index in a in-memory cache, which it can also use to create instances of its objects dynamically. Cache invalidation issues are avoided by invalidating the entire cache whenever a request could possibly have updated the security configuration.

Potential problem

As the amounts of mutable data grows in the Security Plugin, invalidation will happen more often and take longer to resolve. That data also has to be held in memory and higher JVM memory pressure takes down nodes.

To date I have seen several anecdotes around tight loops interacting with security objects, but not what I would classify as evidence to justify a change.

Potential next steps

How bad is the cache invalidation on the service performance, user perceptible? Is it impacted by cluster size?
If using get/search calls to an index of users/permissions, how fast is it compared to the cache?
Root cause on what is slowing down node start up (mostly see failing integration test runs), this might be the cause but maybe its something else?

stephen-crawford commented 1 year ago

@peternied thank you for your detailed notes on this. I have updated the Permission Storage design I was working on to include the parts that overlap with what you found.

Am I correct in understanding your third sentence in the "Potential Problem" section, that you would not want to split the storage apart anymore? You mention that you don't see "evidence to justify a change", so should I pivot from solving this similarly to the permission handling?

peternied commented 1 year ago

I still think we should store the user data separately, but I'm not as convinced what the drawbacks are with the existing implementation. If we wanted to explore the implementation I've highlighted how I think we could investigate it.

If we had a pull request in-hand reusing the existing storage concepts for a separate user data storage I wouldn't request changes around these design elements without more more data.

stephen-crawford commented 1 year ago

Hi @peternied, thank you for the clarification. I made some notes over on the linked document about the current limitations with the way that user data is stored from a permissions perspective. In short we have a lot of reloads of the entire cluster, and are inefficient in how they add data.

For a specific permissions-related example, there are already 5 sub-evaluators being passed to the PrivilegesEvaluator as new types of actions get added. Each of these corresponds to another plugin which needs to have its permissions stored. This already seems to be adding a lot of overhead and code debt to the process of adding plugins. If we roll out extensions and encourage people to contribute their own then we are only going to run into having more and more mini-evaluators.

I know that this thread is about more than permissions but those are currently the configurations I am most familiar with. I will start looking into some of the restrictions for other configurations and update this issue as I uncover details.

peternied commented 1 year ago

@scrawfor99 Looks like your comment was truncated Another example is

stephen-crawford commented 1 year ago

Hi @peternied, oops. I don't have another example yet but will update when I do haha.

opensearch-project / security