Open nibix opened 1 month ago
Please have a look at the approaches. I would be very interested in your opinions.
As mentioned above, the implementation is not complete yet. The code contains a couple of TODO comments to indicate what work needs to be done.
I would also like to discuss whether a couple of things would be really necessary or whether there might be a chance to simplify the implementation by abolishing them.
These are:
multi_rolespan_enabled
to false
. The OpenSearch docs do not mention this flag. In my perception, there is no real use of this setting except maintaining backwards compatiblity. However, for OpenSearch, the default of was always true since its inception. Are there really users having it set to false?I have also started to work on the micro benchmarks as discussed in #3903. The generally accepted standard for micro benchmarks in Java is the JMH framework. However, this is licensed as GPL v2 with classpath exception: https://github.com/openjdk/jmh/blob/master/LICENSE Is the inclusion of a dependency with such a license acceptable in OpenSearch?
I have also started to work on the micro benchmarks as discussed in #3903. The generally accepted standard for micro benchmarks in Java is the JMH framework. However, this is licensed as GPL v2 with classpath exception: https://github.com/openjdk/jmh/blob/master/LICENSE Is the inclusion of a dependency with such a license acceptable in OpenSearch?
@cwperks Can you look into this?
@cwperks Can you look into this?
We're looking into this and will get back with an answer.
However, this is licensed as GPL v2 with classpath exception: https://github.com/openjdk/jmh/blob/master/LICENSE Is the inclusion of a dependency with such a license acceptable in OpenSearch?
The feedback we received was that the code can be used only for internal operations. Since JMH usage will be part of Open-source security, my understanding is that this is not approved.
Attention: Patch coverage is 82.36220%
with 112 lines
in your changes missing coverage. Please review.
Project coverage is 65.37%. Comparing base (
9caf5cb
) to head (f653945
). Report is 5 commits behind head on main.:exclamation: Current head f653945 differs from pull request most recent head b5ff5c8
Please upload reports for the commit b5ff5c8 to get more accurate results.
@cwperks @peternied @DarshitChanpura @scrawfor99
Just FYI:
I worked a bit on the micro benchmarking part of this issue. As JMH was out due to its license, I reviewed other frameworks. It is remarkable that in most cases the descriptions of the frameworks will say "rather use JMH instead of this framework".
Anyway, I tried out https://github.com/noconnor/JUnitPerf because the idea of using JUnit infrastructure seemed to be nice. The big downside of JUnitPerf is that it does not work well together with parameterized JUnit tests.
See here for an example:
The high number of very similar methods is caused by the lack of parameter support - in the end we need to test quite a few different dimensions (like number of indices, number of roles, etc), on the same operation.
As I was really keen on getting some broader result, I went on the "roll your own" path and quick threw together some naive micro benchmarking code. So, this is just a temporary thing, thus very messy, but it gives me some numbers. See here:
So, I let run some tests and here are some preliminary results.
Generally, the real world meaningfulness of micro benchmarks is limited. On a full real cluster, this can look totally different due to:
On the other hand, micro benchmarks make some tests so much easier. For micro benchmarking, a Metadata
instance with 100000 indices can be mocked within a few seconds. On the other hand, creating so many indices on a real cluster would take much, much longer.
Full cluster benchmarks are also coming up, but these are still in the works.
The micro benchmarks were applied to the following code:
For comparison, we also applied the micro benchmarks to the following code on the old code base:
Due to refactorings, the code looks different. However, what happens under the hood is effectively the same.
Additionally some further code changes were necessary to make PrivilegeEvaluator
independent from dependencies like ClusterService
in order to make it really unit testable/benchmarkable. I first tried to use Mockito to mock ClusterService
instances but had to learn that the performance characteristics of Mockito are so bad that it is unsuitable for micro benchmarking.
As we only look at the evaluate()
method, DLS/FLS evaluation is disabled for this scope.
We tested privilege evaluation with three different actions:
indices:data/write/bulk[s]
with BulkShardRequest
indices:data/write/bulk
with BulkRequest
indices:data/read/search
with SearchRequest
We tested with these indices:
10 indices:index_a0
, index_a1
, index_b0
, index_b1
, index_c0
, index_c1
, ... , index_e0
, index_e1
30 indices: index_a0
, ..., index_a5
, ... , index_e0
, ... index_e5
100 indices: index_a0
, ..., index_a19
, ... , index_e0
, ... index_e19
300 indices
1000 indices
3000 indices
10000 indices
30000 indices
100000 indices
A user with full privileges (using *
for index_permissions
and cluster_permssions
)
A user with a single role giving CRUD permissions on index_a*
and index_b*
A user with 20 roles giving CRUD permissions individually on index_a0
, index_a1
, ...
A user with 40 roles in total: 20 roles giving READ permissions individually on index_a0
, index_a1
, ... and 20 more roles giving WRITE permissions on the same indices
A user with a single role which uses a regex index pattern with a user attribute. This is interesting because it makes certain pre-computations impossible and requires to re-evaluate the index patterns for each request.
The raw result data can be found here: https://docs.google.com/spreadsheets/d/1Hd6pZFICTeplXIun3UpEplANAwQDE0jXbqLnnJz61AI/edit?usp=sharing
In the shards below, dashed lines indicate the performance of the old privilege evaluation code on a particular combination of test dimensions. Solid lines with the same color indicate the performance of the new code with the same test dimensions. The x-axis represents the number of indices on the cluster, the y-axis represents the throughput in operations per second.
bulk[s]
, BulkShardRequest
The performance of BulkShardRequest
s is the most interesting factor on clusters doing heavy ingestion. A single bulk requests will be broken down into the individual indices and shards, resulting in quite a few BulkShardRequest
s for which the privilege evaluation needs to be done in parallel, thus performance characteristics here have a high impact.
The privilege evaluation for the top level BulkRequest
is less interesting because it is just an index-independent cluster privilege, which is easy to evaluate. Still, we will also review this below.
The performance of the old code degrades with the increasing number of indices. Starting with 30000 indices, we have a method call latency which is > 10 ms. This is where users on ingestion heavy clusters often start to experience performance issues and the method calls start to show up in the hot thread dumps.
In contrast, the throughput of the new code stays constant, independent of the number of indices. It can be seen that the number of roles still has quite an effect on the throughput. But here we talk about time differences below 0.1 ms, which should not be significant in a real world cluster.
bulk
, BulkRequest
The top level bulk action is a cluster action, so it does not require considering the indices on a cluster.
As expected, performance is independent of number of indices, both on the new implementation and on the old implementation. However, the new implementation improves throughput by a factor between 2 and 3.
search
, SearchRequest
Search operations become interesting when there are monitoring/alerting solutions issuing search requests on broad index patterns in short time intervals.
Both the old and new code degrade with the growing number of indices. Profiling shows that this is mostly not due to privilege evaluation, but due to the index pattern expression resolution.
However, the new code retains method call latencies below 20 ms even on clusters with 100000 indices. The old code however, takes up to 5 seconds for a single method call on clusters with 100000 indices.
See the following chart for a zoomed in section of the 2% of indices case for 10000-100000 indices:
Description
This is a preview of a possible implementation for #3870: Optimized Privilege Evaluation.
The major purpose is to give an initial impression on the approach and to facilitate review. However, more implementation will be necessary.
Performance tests indicate that the OpenSearch security layer adds a noticeable overhead to the indexing throughput of an OpenSearch cluster. The overhead may vary depending on the number of indices, the use of aliases, the number of roles and the size of the user object. The goal of these changes is to improve privilege evaluation performance and to make it less dependent on the number of indices, etc.
The main behavior will not change. However, I would like to discuss whether this opportunity can be used to get rid of special behaviors which seem to be obscure and mostly useless.
Issues Resolved
3870
Testing
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.