Open peternied opened 8 months ago
Based on these results here are a couple of the findings. Overall, I would recommend avoiding StringQueries when using masked fields as there appears to be an disproportional use of memory compared to other search types. There are likely optimizations that could be added for some kinds of fields - such as building a cache of masked values to avoid the allocation tax, but its unclear if this would be helpful in high-cardinality scenarios.
The following are a couple of sources for this recommendations
The attempts column increases if GC was called during a test run, as that makes the memory calculation likely to be incorrect. So when attempts is higher, more GCs were performed - higher JVM usage. Baseline performs best, with Term being impacted slightly more often, and finally StringQuery seems to trip GC more frequently.
The complexity of the masking operation, masking an LONG vs a STRING did not appear to impact the usage, how ever there is a clear increase of between 15-30% memory usage when a field is masked in either Baseline or Term scenarios. This overhead is semi-expected due to the masking operations happening as the data is accessed.
StringQueries
When comparing Term
vs StringQueries
there is a 2x increase in memory usage when masked fields are involved, maybe additional rework that doesn't align with the behavoir pattern when looking at non-masked fields which are generally more memory expensive.
[Triage] Hi @peternied, thank you for the extremely detailed issue. You provided a lot of good details and we can take a couple of paths forward:
The first task is definitely actionable, the other two tasks can be investigated on an RFC or help wanted basis since it may not be too clear how to correct this quickly.
@peternied I noticed that field masking is being applied on search requests where the request size is 0 and the only aggregations requested are count or cardinality. This may be an area that can be optimized to reduce the memory footprint of field masking.
This PR on my fork contains an example: https://github.com/cwperks/security/pull/24 https://github.com/opensearch-project/security/pull/4362
Assign a user to be able to search the index, but anonymize the artist
field.
Run a Search Request of size 0 with a cardinality aggregation (uniqueness count)
For these types of queries, is applying the field masking necessary? It looks like extra overhead for a query that is already not requesting the sensitive info. If the size is >0 or has aggregations other than cardinality and count (such as max or min where raw data can be exposed) then the field masking should be applied.
Here's the updated table with the last row moved to the top:
Role | Condition | Count | Attempts | Avg Heap Used | Max Heap Used | Min Heap Used | Std Deviation |
---|---|---|---|---|---|---|---|
reader | 100 | 108 | 861,267 | 1,333,568 | 697,264 | 126,293 | |
admin | 100 | 123 | 3,861,272 | 24,316,608 | 632,368 | 4,897,820 | |
reader | ROLE_WITH_NO_MASKING | 100 | 108 | 681,459 | 943,384 | 504,032 | 74,876 |
reader | MASKING_LOW_REPEAT_VALUE | 100 | 107 | 663,067 | 835,976 | 524,560 | 63,470 |
reader | MASKING_RANDOM_LONG | 100 | 107 | 938,830 | 1,236,864 | 646,680 | 156,187 |
reader | MASKING_RANDOM_STRING | 100 | 107 | 966,501 | 1,299,488 | 601,336 | 138,535 |
Here's the updated table with the last row moved to the top:
Role | Condition | Count | Attempts | Avg Heap Used | Max Heap Used | Min Heap Used | Std Deviation |
---|---|---|---|---|---|---|---|
reader | 100 | 106 | 982,778 | 1,271,616 | 717,296 | 134,142 | |
admin | 100 | 110 | 904,744 | 1,548,496 | 526,032 | 249,730 | |
reader | ROLE_WITH_NO_MASKING | 100 | 107 | 667,823 | 1,063,920 | 446,520 | 74,430 |
reader | MASKING_LOW_REPEAT_VALUE | 100 | 107 | 665,148 | 878,424 | 529,000 | 69,972 |
reader | MASKING_RANDOM_LONG | 100 | 107 | 1,049,116 | 1,654,584 | 642,648 | 153,067 |
reader | MASKING_RANDOM_STRING | 100 | 107 | 984,706 | 1,248,968 | 566,432 | 145,058 |
Role | Condition | Count | Attempts | Avg Heap Used | Max Heap Used | Min Heap Used | Std Deviation |
---|---|---|---|---|---|---|---|
admin | 100 | 107 | 1,824,332 | 10,282,096 | 293,264 | 1,807,192 | |
reader | 100 | 105 | 744,310 | 1,247,896 | 474,384 | 180,697 | |
reader | ROLE_WITH_NO_MASKING | 100 | 105 | 746,171 | 1,337,872 | 464,712 | 180,388 |
reader | MASKING_LOW_REPEAT_VALUE | 100 | 106 | 1,564,257 | 2,180,256 | 1,247,528 | 208,667 |
reader | MASKING_RANDOM_LONG | 100 | 106 | 1,385,117 | 2,133,552 | 848,360 | 283,312 |
reader | MASKING_RANDOM_STRING | 100 | 105 | 1,225,246 | 2,181,888 | 799,584 | 258,089 |
Role | Condition | Count | Attempts | Avg Heap Used | Max Heap Used | Min Heap Used | Std Deviation |
---|---|---|---|---|---|---|---|
admin | 100 | 110 | 847,600 | 1,481,680 | 626,416 | 109,946 | |
reader | 100 | 108 | 869,619 | 1,576,624 | 621,760 | 180,665 | |
reader | ROLE_WITH_NO_MASKING | 100 | 107 | 763,011 | 1,086,568 | 586,408 | 90,824 |
reader | MASKING_LOW_REPEAT_VALUE | 100 | 110 | 2,345,859 | 2,830,104 | 1,167,248 | 216,459 |
reader | MASKING_RANDOM_LONG | 100 | 110 | 2,249,719 | 2,523,008 | 2,058,280 | 81,800 |
reader | MASKING_RANDOM_STRING | 100 | 109 | 2,272,825 | 2,807,928 | 1,275,320 | 196,273 |
Role | Condition | Count | Attempts | Avg Heap Used | Max Heap Used | Min Heap Used | Std Deviation |
---|---|---|---|---|---|---|---|
admin | 100 | 111 | 4,848,198 | 29,831,272 | 812,496 | 6,303,649 | |
reader | 100 | 106 | 1,043,223 | 1,562,624 | 781,976 | 145,309 | |
reader | ROLE_WITH_NO_MASKING | 100 | 106 | 1,014,046 | 1,419,760 | 718,328 | 143,819 |
reader | MASKING_LOW_REPEAT_VALUE | 100 | 110 | 3,269,142 | 3,852,016 | 2,437,888 | 190,123 |
reader | MASKING_RANDOM_LONG | 100 | 107 | 3,206,087 | 3,532,984 | 2,926,448 | 128,675 |
reader | MASKING_RANDOM_STRING | 100 | 108 | 3,184,283 | 3,886,000 | 2,569,528 | 239,173 |
Here's the provided data reformatted into the specified table format:
Role | Condition | Count | Attempts | Avg Heap Used | Max Heap Used | Min Heap Used | Std Deviation |
---|---|---|---|---|---|---|---|
admin | 100 | 110 | 1,167,121 | 1,493,264 | 964,176 | 91,059 | |
reader | 100 | 107 | 1,064,978 | 1,516,824 | 747,968 | 155,176 | |
reader | ROLE_WITH_NO_MASKING | 100 | 107 | 1,065,616 | 1,606,152 | 697,152 | 176,486 |
reader | MASKING_LOW_REPEAT_VALUE | 100 | 115 | 5,272,236 | 5,704,016 | 4,998,000 | 135,232 |
reader | MASKING_RANDOM_LONG | 100 | 113 | 5,387,294 | 5,730,312 | 5,162,648 | 114,408 |
reader | MASKING_RANDOM_STRING | 100 | 113 | 5,223,454 | 5,552,800 | 5,001,576 | 100,145 |
Role | Condition | Count | Attempts | Avg Heap Used | Max Heap Used | Min Heap Used | Std Deviation |
---|---|---|---|---|---|---|---|
reader | 100 | 107 | 683,951 | 1,299,064 | 61,144 | 255,714 | |
reader | MASKING_RANDOM_STRING | 100 | 108 | 724,074 | 1,193,984 | 532,120 | 121,375 |
reader | MASKING_RANDOM_LONG | 100 | 108 | 728,235 | 1,209,752 | 499,328 | 106,161 |
reader | MASKING_LOW_REPEAT_VALUE | 100 | 109 | 685,820 | 896,616 | 437,232 | 69,834 |
reader | ROLE_WITH_NO_MASKING | 100 | 109 | 692,526 | 943,160 | 512,136 | 72,825 |
admin | 100 | 122 | 2,736,726 | 19,272,432 | 622,224 | 3,459,203 |
Here is the formatted data in a table:
Role | Condition | Count | Attempts | Avg Heap Used | Max Heap Used | Min Heap Used | Std Deviation |
---|---|---|---|---|---|---|---|
reader | 100 | 107 | 706,473 | 1,086,544 | 528,256 | 90,361 | |
reader | MASKING_RANDOM_STRING | 100 | 107 | 715,920 | 1,099,000 | 545,888 | 92,117 |
reader | MASKING_RANDOM_LONG | 100 | 106 | 706,888 | 1,255,768 | 517,880 | 109,191 |
reader | MASKING_LOW_REPEAT_VALUE | 100 | 107 | 797,238 | 1,289,392 | 533,688 | 156,511 |
reader | ROLE_WITH_NO_MASKING | 100 | 107 | 805,953 | 1,080,208 | 485,448 | 132,880 |
admin | 100 | 108 | 702,158 | 1,174,728 | 531,128 | 91,654 |
Role | Condition | Count | Attempts | Avg Heap Used | Max Heap Used | Min Heap Used | Std Deviation |
---|---|---|---|---|---|---|---|
reader | 100 | 105 | 1,240,216 | 1,834,912 | 895,920 | 182,783 | |
admin | 100 | 108 | 5,719,315 | 41,066,960 | 636,096 | 8,486,929 | |
reader | ROLE_WITH_NO_MASKING | 100 | 105 | 752,132 | 1,409,144 | 526,120 | 180,726 |
reader | MASKING_LOW_REPEAT_VALUE | 100 | 105 | 767,652 | 1,255,280 | 526,224 | 169,239 |
reader | MASKING_RANDOM_LONG | 100 | 105 | 1,290,737 | 2,092,840 | 916,792 | 219,425 |
reader | MASKING_RANDOM_STRING | 100 | 106 | 1,268,168 | 1,835,760 | 927,896 | 207,900 |
Role | Condition | Count | Attempts | Avg Heap Used | Max Heap Used | Min Heap Used | Std Deviation |
---|---|---|---|---|---|---|---|
reader | 100 | 107 | 1,411,215 | 1,729,784 | 1,167,728 | 108,204 | |
reader | MASKING_RANDOM_STRING | 100 | 107 | 1,458,550 | 1,748,408 | 1,289,344 | 81,905 |
reader | MASKING_RANDOM_LONG | 100 | 108 | 1,519,651 | 1,792,320 | 1,256,032 | 117,450 |
reader | MASKING_LOW_REPEAT_VALUE | 100 | 106 | 890,869 | 1,327,336 | 544,312 | 175,251 |
reader | ROLE_WITH_NO_MASKING | 100 | 108 | 904,850 | 1,295,392 | 677,272 | 135,768 |
admin | 100 | 108 | 924,740 | 1,534,144 | 626,872 | 191,236 |
Role | Condition | Count | Attempts | Avg Heap Used | Max Heap Used | Min Heap Used | Std Deviation |
---|---|---|---|---|---|---|---|
reader | 100 | 112 | 3,187,505 | 3,573,144 | 2,712,240 | 160,524 | |
reader | MASKING_RANDOM_STRING | 100 | 113 | 3,317,756 | 3,626,848 | 2,635,784 | 112,230 |
reader | MASKING_RANDOM_LONG | 100 | 115 | 3,341,987 | 3,840,784 | 3,120,224 | 151,827 |
reader | MASKING_LOW_REPEAT_VALUE | 100 | 108 | 1,065,400 | 1,433,872 | 763,416 | 148,780 |
reader | ROLE_WITH_NO_MASKING | 100 | 110 | 1,089,621 | 1,363,248 | 819,416 | 145,270 |
admin | 100 | 125 | 2,498,277 | 14,298,976 | 279,928 | 2,153,253 |
Role | Condition | Count | Attempts | Avg Heap Used | Max Heap Used | Min Heap Used | Std Deviation |
---|---|---|---|---|---|---|---|
reader | 100 | 113 | 4,825,162 | 5,211,376 | 4,600,384 | 114,015 | |
reader | MASKING_RANDOM_STRING | 100 | 114 | 4,923,908 | 5,256,120 | 4,749,248 | 111,796 |
reader | MASKING_RANDOM_LONG | 100 | 114 | 4,893,227 | 5,327,032 | 4,687,368 | 129,189 |
reader | MASKING_LOW_REPEAT_VALUE | 100 | 108 | 1,160,330 | 1,544,976 | 978,552 | 81,653 |
reader | ROLE_WITH_NO_MASKING | 100 | 108 | 1,166,856 | 1,414,224 | 883,584 | 91,983 |
admin | 100 | 113 | 1,361,717 | 1,961,136 | 1,106,248 | 214,172 |
In the below image, I have compared the results of the original tables with the results of the tables with Craig's change. Red in the left table means the corresponding value is greater than the equivalent position in the right column tables. As shown the change does directly impact the testing scenarios covered by Peter's test framework for a majority of the measured values.
Isn't lower memory usage better, admin seems to be doing better in the original vs the updated change? Seems like it was 1GB vs 4GB, that looks like a 400% regression
Looking at the difference between the changes just focusing in on MASKING_RANDOM_STRING, that seems like only 5% improvement, while that isn't nothing - its pretty close to the standard deviation of 111,796.
Role | Condition | Count | Attempts | Avg Heap Used | Max Heap Used | Min Heap Used | Std Deviation |
---|---|---|---|---|---|---|---|
reader | MASKING_RANDOM_STRING | 100 | 113 | 5,223,454 | 5,552,800 | 5,001,576 | 100,145 |
Role | Condition | Count | Attempts | Avg Heap Used | Max Heap Used | Min Heap Used | Std Deviation |
---|---|---|---|---|---|---|---|
reader | MASKING_RANDOM_STRING | 100 | 114 | 4,923,908 | 5,256,120 | 4,749,248 | 111,796 |
@scrawfor99 @peternied The change in https://github.com/cwperks/security/pull/24 is a very narrow change that applies to
search requests that have size = 0
and only contain cardinality
or count
aggregations only. Its probably not representative to do the entire suite of tests when it applies narrowly in those 2 use-cases.
Hi @cwperks, of course. No worries. I am sure the change accomplishes what you intended. I just reran the tests to see if it would fix everything or not. It is no problem to add some more changes and see where that goes.
I went through the code and found some possible refactoring that may help improve the heap use based on the implementation.
I also found some lines I am not sure the purpose of...
First, we should avoid the expanding the aggregation builder factories if we can help it. I noticed we iterate over the list at least 3 separate times within DlsFlsValvleImpl
alone. While this is a linear operation, if we are concerned about the heap use, we end up storing additional objects each time we do that so its not great.
We can also avoid the frequent conditionals used when identifying the aggregationBuilder type via instanceof
by adding a setHint
method to the AB interface all of the aggregations implement (this would happen in core).
We can avoid storing strings in the heap by logging only when the appropriate logger level is set. This would require a larger logger reword but instead of always logging all lines and only printing based on the configured level, we would only log lines which passed the level of configured logging. This matters more than it seems since in some areas we are logging complete requests as JSONs. This will take up a fair amount of space.
We can avoid checking each request for an update or resize request by adding a flag to the interface which allows us to fetch whether they are cache-able.
Likewise, we can have a request signal if it has source.profile
or a min doc count of 0.
We can try to intern strings throughout the code.
We can swap to HashSets etc.
We can remove streams for loops since they are less memory efficient.
Condense the SetFLS which duplicates the serialization process done in the DLS logic.
Iws also not clear why we deserialized in this code:
private void setDlsHeaders(EvaluatedDlsFlsConfig dlsFls, ActionRequest request) {
if (!dlsFls.getDlsQueriesByIndex().isEmpty()) {
Map<String, Set<String>> dlsQueries = dlsFls.getDlsQueriesByIndex();
if (request instanceof ClusterSearchShardsRequest && HeaderHelper.isTrustedClusterRequest(threadContext)) {
threadContext.addResponseHeader(
ConfigConstants.OPENDISTRO_SECURITY_DLS_QUERY_HEADER,
Base64Helper.serializeObject((Serializable) dlsQueries)
);
if (log.isDebugEnabled()) {
log.debug("added response header for DLS info: {}", dlsQueries);
}
} else {
if (threadContext.getHeader(ConfigConstants.OPENDISTRO_SECURITY_DLS_QUERY_HEADER) != null) {
Object deserializedDlsQueries = Base64Helper.deserializeObject(
threadContext.getHeader(ConfigConstants.OPENDISTRO_SECURITY_DLS_QUERY_HEADER),
threadContext.getTransient(ConfigConstants.USE_JDK_SERIALIZATION)
);
if (!dlsQueries.equals(deserializedDlsQueries)) {
throw new OpenSearchSecurityException(
ConfigConstants.OPENDISTRO_SECURITY_DLS_QUERY_HEADER + " does not match (SG 900D)"
);
}
} else {
threadContext.putHeader(
ConfigConstants.OPENDISTRO_SECURITY_DLS_QUERY_HEADER,
Base64Helper.serializeObject((Serializable) dlsQueries)
);
if (log.isDebugEnabled()) {
log.debug("attach DLS info: {}", dlsQueries);
}
}
}
}
}
Avoiding that at the does not match
check would save some heap.
I will start going through making these changes and rerun the tests. Leaving the interface and logger changes for last.
Thanks for the deep dive @scrawfor99! How are the performance metrics with the suggested changes implemented?
Should this be split into multiple parts?
Should this be split into multiple parts?
Yes please - ideally each change would be in isolation with clean before and after results that prove we are moving in the right direction.
I just finished going over the DlsFlsLeafReader
in a similar manner to the DlsFlsValveImpl
. There was less obvious changes to be made than in the latter package.
Ultimately some options I found were:
Now going to to break the changes for the DlsFlsValveImpl into the following:
Change Version | Change
1 | Avoid aggregation builder expansion 2 | Remove logging 3 | Use Hash Sets 4 | Use loops instead of streams 5 | Condense setFls code
I will share the test results of each change type after I make them.
The interface modifying changes I am not going to test at the moment since that requires more effort. I will see how much improvement these changes make and then if it is not much, will go forward with the interface changes.
These are the results of running the test suite after making the above change configurations.
As we can see, there is not an obvious pattern for what is the most effective way to improve the heap performance of the different query types.
I am not sure the next steps forward with improving this so going to reach out to @jainankitk to see if they have any reccomendations.
What is the bug? We've seen reports of heap memory usage spiking with field masking enabled. With how masking is implemented at the leaf level its possible that certain types of queries cause the masked fields to be materialized even when they are not used.
How can one reproduce the bug? Steps to reproduce the behavior:
./gradlew integrationTest --tests org.opensearch.security.MaskingTests
Baseline behavior
Baseline query with the following format:
Creating 3 Indices with 5000 Documents
Creating 3 Indices with 50000 Documents
Query with Aggregate Filter
Query with the following format:
Creating 3 Indices with 5000 Documents
Creating 3 Indices with 50000 Documents
Term Match Query:
Query with the following format:
Creating 3 Indices with 5000 Documents
Creating 3 Indices with 50000 Documents