Closed dgilling closed 1 year ago
From an initial dive, I think the issue lies with the security permissions around scripting modules. By default, SpecialPermission is granted to the non-plugin code base i.e the default code base here. But when using scripts, the context has policy of the scripting plugin which does not hold these permissions.
This will affect all repositories which are initialized using plugins since all of them check for this special permissions - S3, GCP and Azure.
I will continue investigating on this issue.
~@kotwanikunal plugins have their own dedicated permissions which are registered with Java Security Manager via security policy file. This blog post talks about it in detail[1]~ Ignore me you've already got there :). Its mostly about the caller's context matters which is azure plugin (not painless script) who does not have the elevated permissions. Lets try adding it to the policy ?
@dgilling assuming you are using repository-azure from this repo, could you modify the plugin-security.policy[2] file and add org.opensearch.SpecialPermission
to the list?
~[1] https://opensearch.org/blog/plugins-intro/~
[2] https://github.com/opensearch-project/OpenSearch/blob/e4d9fb50b74a0b4e7f9b7bb1315e92c1f136efde/plugins/repository-azure/src/main/plugin-metadata/plugin-security.policy
~@kotwanikunal plugins have their own dedicated permissions which are registered with Java Security Manager via security policy file. This blog post talks about it in detail[1]~ Ignore me you've already got there :). Its mostly about the caller's context matters which is azure plugin (not painless script) who does not have the elevated permissions. Lets try adding it to the policy ?
@dgilling assuming you are using repository-azure from this repo, could you modify the plugin-security.policy[2] file and add
org.opensearch.SpecialPermission
to the list? ~[1] https://opensearch.org/blog/plugins-intro/~ [2] https://github.com/opensearch-project/OpenSearch/blob/e4d9fb50b74a0b4e7f9b7bb1315e92c1f136efde/plugins/repository-azure/src/main/plugin-metadata/plugin-security.policy
@saratvemulapalli I think the painless plugin does change the context in some way since we can query using OpenSearch + Azure when running a non scripted query. In case of OpenSearch + painless + Azure, the requests raise the exception.
One of the reasons SpecialPermission
exists is to block scripts from accessing Azure connections as per the documentation.
@kotwanikunal @saratvemulapalli we were thinking this as well but could this cause a security risk?
@kotwanikunal @saratvemulapalli we were thinking this as well but could this cause a security risk?
@dgilling You can test the feature out on a development environment with the added permission. I am still looking into a fix for this.
@kotwanikunal Interestingly, I added the permission but confirmed it doesn't work. This is on latest 2.7 snapshot
grant {
// azure client opens socket connections for to access repository
permission org.opensearch.SpecialPermission "*";
permission java.net.SocketPermission "*", "connect,resolve";
permission java.lang.RuntimePermission "setFactory";
permission java.net.NetPermission "getProxySelector";
permission java.lang.RuntimePermission "accessDeclaredMembers";
permission java.lang.reflect.ReflectPermission "suppressAccessChecks";
permission java.lang.RuntimePermission "setContextClassLoader";
// azure client set Authenticator for proxy username/password
permission java.net.NetPermission "setDefaultAuthenticator";
};
@kotwanikunal Interestingly, I added the permission but confirmed it doesn't work. This is on latest 2.7 snapshot
grant { // azure client opens socket connections for to access repository permission org.opensearch.SpecialPermission "*"; permission java.net.SocketPermission "*", "connect,resolve"; permission java.lang.RuntimePermission "setFactory"; permission java.net.NetPermission "getProxySelector"; permission java.lang.RuntimePermission "accessDeclaredMembers"; permission java.lang.reflect.ReflectPermission "suppressAccessChecks"; permission java.lang.RuntimePermission "setContextClassLoader"; // azure client set Authenticator for proxy username/password permission java.net.NetPermission "setDefaultAuthenticator"; };
@kotwanikunal , my bad. Had a syntax error in my permission file. Works once fixed.
@kotwanikunal @saratvemulapalli , Even after fixing syntax, it's still not working. Same error:
"java.base/java.security.AccessControlContext.checkPermission(AccessControlContext.java:485)",
"java.base/java.security.AccessController.checkPermission(AccessController.java:1068)",
"java.base/java.lang.SecurityManager.checkPermission(SecurityManager.java:416)",
"org.opensearch.SpecialPermission.check(SpecialPermission.java:104)",
"org.opensearch.repositories.azure.SocketAccess.doPrivilegedException(SocketAccess.java:68)",
"org.opensearch.repositories.azure.AzureBlobStore.getInputStream(AzureBlobStore.java:282)",
"org.opensearch.repositories.azure.AzureBlobContainer.openInputStream(AzureBlobContainer.java:104)",
"org.opensearch.repositories.azure.AzureBlobContainer.readBlob(AzureBlobContainer.java:122)",
"org.opensearch.index.store.remote.utils.TransferManager.downloadBlockLocally(TransferManager.java:102)",
"org.opensearch.index.store.remote.utils.TransferManager.fetchOriginBlob(TransferManager.java:88)",
"org.opensearch.index.store.remote.utils.TransferManager.lambda$fetchBlob$0(TransferManager.java:54)",
"org.opensearch.index.store.remote.utils.ConcurrentInvocationLinearizer.linearizeInternal(ConcurrentInvocationLinearizer.java:70)",
"org.opensearch.index.store.remote.utils.ConcurrentInvocationLinearizer.linearize(ConcurrentInvocationLinearizer.java:46)",
"org.opensearch.index.store.remote.utils.TransferManager.fetchBlob(TransferManager.java:52)",
"org.opensearch.index.store.remote.file.OnDemandBlockSnapshotIndexInput.fetchBlock(OnDemandBlockSnapshotIndexInput.java:148)",
"org.opensearch.index.store.remote.file.OnDemandBlockIndexInput.demandBlock(OnDemandBlockIndexInput.java:340)",
"org.opensearch.index.store.remote.file.OnDemandBlockIndexInput.seekInternal(OnDemandBlockIndexInput.java:311)",
"org.opensearch.index.store.remote.file.OnDemandBlockIndexInput.seek(OnDemandBlockIndexInput.java:209)",
"org.opensearch.index.store.remote.file.OnDemandBlockSnapshotIndexInput.seek(OnDemandBlockSnapshotIndexInput.java:28)",
"org.apache.lucene.codecs.lucene90.IndexedDISI.advanceBlock(IndexedDISI.java:484)",
@dgilling are there steps I could follow to reproduce the problem?
@saratvemulapalli yes:
Steps taken:
permission org.opensearch.SpecialPermission;
to plugins/repository-azure/plugin-security.policy
similar to the permissions for core OS here "my_agg": {
"sum": {
"script": {
"inline": "(doc['field_a'].empty ? 0 : doc['field_a'].value) * (doc['field_b'].empty ? 1 : doc['field_b'].value)"
}
}
}
We have two identical clusters with same cluster settings, image, and data mapping other than size of data/machines. On the tiny dev cluster, we don't hit the security exception. On a larger staging cluster, we do hit it.
In addition, when I enabled trace logs for org.opensearch.repositories
and org.opensearch.index.store.remote
, the small cluster is not printing readBlob
but the large one does. My thinking is small one hit the cache only and so didn't hit the condition. However, I'm not too familiar with that code and could be wrong.
[2023-03-25T18:09:17,146][TRACE][o.o.r.a.AzureBlobContainer] [d10b89b39b03] readBlob(__5hjiSNCSTPS-FECemg.part0) from position [4051697664] with length [8388608]
[2023-03-25T18:09:17,146][TRACE][o.o.r.a.AzureBlobStore ] [d10b89b39b03] reading container [opensearch-snapshots], blob [indices/0ID0aRZ4T-evvWREWFBI/0/__5hjiSNCSTPS-FECemgDpart0]
[2023-03-25T18:09:17,146][TRACE][o.o.r.a.AzureBlobContainer] [d10b89b39b03] readBlob(__Tm3qt-zHTVmCk858tU.part0) from position [4034920448] with length [8388608]
[2023-03-25T18:09:17,147][TRACE][o.o.r.a.AzureBlobStore ] [d10b89b39b03] reading container [opensearch-snapshots], blob [indices/0ID0aRZ4T-evvWREWFBI/3/__Tm3qt-zHTVmCk858tU.part0]
[2023-03-25T18:09:17,148][TRACE][o.o.r.a.AzureBlobContainer] [d10b89b39b03] readBlob(___qkclJ3-RhikTZ7vQNif.part0) from position [4026531840] with length [8388608]
[2023-03-25T18:09:17,148][TRACE][o.o.r.a.AzureBlobStore ] [d10b89b39b03] reading container [opensearch-snapshots], blob [indices/0ID0aRZ4T-evvWREWFBI/7/___qkclJ3-RhikTZ7vQNif.part0]
We are using the latest 2.7.0 snapshot from opensearchstaging docker hub. However, I should note that it doesn't include this latest PR which does change some logic around for TransferManager which is why my stack trace includes fetchOriginBlob
Manifest.yml:
---
schema-version: '1.1'
build:
name: OpenSearch
version: 2.7.0
platform: linux
architecture: x64
distribution: tar
location: https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.7.0/7279/linux/x64/tar/dist/opensearch/opensearch-2.7.0-linux-x64.tar.gz
id: '7279'
components:
- name: OpenSearch
repository: https://github.com/opensearch-project/OpenSearch.git
ref: 2.x
commit_id: b392d2ab5cd6f97d621d89cb2bacbcfb35ee46eb
location: https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.7.0/7279/linux/x64/tar/builds/opensearch/dist/opensearch-min-2.7.0-linux-x64.tar.gz
- name: common-utils
repository: https://github.com/opensearch-project/common-utils.git
ref: 2.x
commit_id: 95d168bc340521253b17e6f2fd3bd60332323e32
- name: job-scheduler
repository: https://github.com/opensearch-project/job-scheduler.git
ref: 2.x
commit_id: 5081933bbfca9b0273a09411fb2d28f334fd09f3
location: https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.7.0/7279/linux/x64/tar/builds/opensearch/plugins/opensearch-job-scheduler-2.7.0.0.zip
- name: k-NN
repository: https://github.com/opensearch-project/k-NN.git
ref: 2.x
commit_id: 582369f81346bab48e121096b9d1b0bdebad9fa7
location: https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.7.0/7279/linux/x64/tar/builds/opensearch/plugins/opensearch-knn-2.7.0.0.zip
- name: geospatial
repository: https://github.com/opensearch-project/geospatial.git
ref: 2.x
commit_id: e2f4af418e9e772aa694d2552c25d4f9020f51a6
location: https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.7.0/7279/linux/x64/tar/builds/opensearch/plugins/opensearch-geospatial-2.7.0.0.zip
- name: security
repository: https://github.com/opensearch-project/security.git
ref: 2.x
commit_id: 17596f750cf0f6d1b35e4b9870218b180d4f8ede
location: https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.7.0/7279/linux/x64/tar/builds/opensearch/plugins/opensearch-security-2.7.0.0.zip
- name: cross-cluster-replication
repository: https://github.com/opensearch-project/cross-cluster-replication.git
ref: 2.x
commit_id: 72082ad4637005f464b54c7f86d84d90f9f3a847
location: https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.7.0/7279/linux/x64/tar/builds/opensearch/plugins/opensearch-cross-cluster-replication-2.7.0.0.zip
- name: ml-commons
repository: https://github.com/opensearch-project/ml-commons.git
ref: 2.x
commit_id: 021c4159c6fe0aee291dbcf7185c3c09ab99c36f
location: https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.7.0/7279/linux/x64/tar/builds/opensearch/plugins/opensearch-ml-2.7.0.0.zip
- name: neural-search
repository: https://github.com/opensearch-project/neural-search.git
ref: 2.x
commit_id: 081e42d0022991b1b93e8787ddee7484c00b822b
location: https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.7.0/7279/linux/x64/tar/builds/opensearch/plugins/opensearch-neural-search-2.7.0.0.zip
- name: sql
repository: https://github.com/opensearch-project/sql.git
ref: 2.x
commit_id: 11d0d6dd7b0dac06ba2a5918ffd4d169501b004d
location: https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.7.0/7279/linux/x64/tar/builds/opensearch/plugins/opensearch-sql-2.7.0.0.zip
My thinking is small one hit the cache only and so didn't hit the condition.
That's almost certainly the case. A small dataset will quickly be pulled to disk and no need to evict anything, so its likely by the time it gets to the painless script the data is already local.
@andrross makes sense. I should also note I saw data was fetched from remote repository for the small indices right after snapshot restore cmd. So it was already in file cache before first search query (vs being on-demand). Is that expected?
@saratvemulapalli let me know if something else I should try besides adding the security permission.
Thanks @kotwanikunal . We tested #6914 and works.
There was an additional fix which was merged in with #6927 which covers another case where you can encounter the error. Please reopen the issue if you encounter it again even after both these fixes.
Much appreciated @kotwanikunal for fixing. We upgraded the large staging cluster with both #6927 and #6917 patches and so far so good :). We'll monitor and follow up if we see an issue.
Unrelated, but curious if there is an issue with the jenkins job publishing to Docker Hub? Seems like the last few builds didnt push to opensearchstaging/opensearch:2.7.0 so we had to build from source.
Describe the bug A clear and concise description of what the bug is. When performing a scripted query on a remote index, a security exception is thrown. This occurs on 2.7.0 snapshot and also using released 2.5.0.
Seems related to how OS/ES handles evaluated permissions.
To Reproduce Steps to reproduce the behavior:
storage_type
is set to remote_snapshotpainless
Expected behavior A clear and concise description of what you expected to happen.
Plugins Please list all plugins currently enabled.
Screenshots If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
Additional context Add any other context about the problem here.