opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.66k stars 1.78k forks source link

[BUG] [Searchable Snapshot] SpecialPermission error when using a scripted query #6641

Closed dgilling closed 1 year ago

dgilling commented 1 year ago

Describe the bug A clear and concise description of what the bug is. When performing a scripted query on a remote index, a security exception is thrown. This occurs on 2.7.0 snapshot and also using released 2.5.0.

Seems related to how OS/ES handles evaluated permissions.

To Reproduce Steps to reproduce the behavior:

  1. Restore an index where storage_type is set to remote_snapshot
  2. Perform a scripted search using painless
  3. A security exception is thrown. On a normal index, no exception occurs.

Expected behavior A clear and concise description of what you expected to happen.

Plugins Please list all plugins currently enabled.

Screenshots If applicable, add screenshots to help explain your problem.

 Caused by: java.lang.SecurityException: access denied ("org.opensearch.SpecialPermission" "*")
    at java.security.AccessControlContext.checkPermission(AccessControlContext.java:485) ~[?:?]
    at java.security.AccessController.checkPermission(AccessController.java:1068) ~[?:?]
    at java.lang.SecurityManager.checkPermission(SecurityManager.java:416) ~[?:?]
    at org.opensearch.SpecialPermission.check(SpecialPermission.java:104) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.repositories.azure.SocketAccess.doPrivilegedException(SocketAccess.java:68) ~[?:?]
    at org.opensearch.repositories.azure.AzureBlobStore.getInputStream(AzureBlobStore.java:282) ~[?:?]
    at org.opensearch.repositories.azure.AzureBlobContainer.openInputStream(AzureBlobContainer.java:104) ~[?:?]
    at org.opensearch.repositories.azure.AzureBlobContainer.readBlob(AzureBlobContainer.java:122) ~[?:?]
    at org.opensearch.index.store.remote.utils.TransferManager.fetchBlob(TransferManager.java:59) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.index.store.remote.file.OnDemandBlockSnapshotIndexInput.fetchBlock(OnDemandBlockSnapshotIndexInput.java:151) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.index.store.remote.file.OnDemandBlockIndexInput.demandBlock(OnDemandBlockIndexInput.java:347) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.index.store.remote.file.OnDemandBlockIndexInput.seekInternal(OnDemandBlockIndexInput.java:318) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.index.store.remote.file.OnDemandBlockIndexInput.seek(OnDemandBlockIndexInput.java:216) ~[opensearch-2.5.0.jar:2.5.0]
    at org.opensearch.index.store.remote.file.OnDemandBlockSnapshotIndexInput.seek(OnDemandBlockSnapshotIndexInput.java:30) ~[opensearch-2.5.0.jar:2.5.0]

Host/Environment (please complete the following information):

Additional context Add any other context about the problem here.

kotwanikunal commented 1 year ago

From an initial dive, I think the issue lies with the security permissions around scripting modules. By default, SpecialPermission is granted to the non-plugin code base i.e the default code base here. But when using scripts, the context has policy of the scripting plugin which does not hold these permissions.

This will affect all repositories which are initialized using plugins since all of them check for this special permissions - S3, GCP and Azure.

I will continue investigating on this issue.

saratvemulapalli commented 1 year ago

~@kotwanikunal plugins have their own dedicated permissions which are registered with Java Security Manager via security policy file. This blog post talks about it in detail[1]~ Ignore me you've already got there :). Its mostly about the caller's context matters which is azure plugin (not painless script) who does not have the elevated permissions. Lets try adding it to the policy ?

@dgilling assuming you are using repository-azure from this repo, could you modify the plugin-security.policy[2] file and add org.opensearch.SpecialPermission to the list? ~[1] https://opensearch.org/blog/plugins-intro/~ [2] https://github.com/opensearch-project/OpenSearch/blob/e4d9fb50b74a0b4e7f9b7bb1315e92c1f136efde/plugins/repository-azure/src/main/plugin-metadata/plugin-security.policy

kotwanikunal commented 1 year ago

~@kotwanikunal plugins have their own dedicated permissions which are registered with Java Security Manager via security policy file. This blog post talks about it in detail[1]~ Ignore me you've already got there :). Its mostly about the caller's context matters which is azure plugin (not painless script) who does not have the elevated permissions. Lets try adding it to the policy ?

@dgilling assuming you are using repository-azure from this repo, could you modify the plugin-security.policy[2] file and add org.opensearch.SpecialPermission to the list? ~[1] https://opensearch.org/blog/plugins-intro/~ [2] https://github.com/opensearch-project/OpenSearch/blob/e4d9fb50b74a0b4e7f9b7bb1315e92c1f136efde/plugins/repository-azure/src/main/plugin-metadata/plugin-security.policy

@saratvemulapalli I think the painless plugin does change the context in some way since we can query using OpenSearch + Azure when running a non scripted query. In case of OpenSearch + painless + Azure, the requests raise the exception. One of the reasons SpecialPermission exists is to block scripts from accessing Azure connections as per the documentation.

dgilling commented 1 year ago

@kotwanikunal @saratvemulapalli we were thinking this as well but could this cause a security risk?

kotwanikunal commented 1 year ago

@kotwanikunal @saratvemulapalli we were thinking this as well but could this cause a security risk?

@dgilling You can test the feature out on a development environment with the added permission. I am still looking into a fix for this.

dgilling commented 1 year ago

@kotwanikunal Interestingly, I added the permission but confirmed it doesn't work. This is on latest 2.7 snapshot

grant {
  // azure client opens socket connections for to access repository
  permission org.opensearch.SpecialPermission "*";
  permission java.net.SocketPermission "*", "connect,resolve";
  permission java.lang.RuntimePermission "setFactory";
  permission java.net.NetPermission "getProxySelector";
  permission java.lang.RuntimePermission "accessDeclaredMembers";
  permission java.lang.reflect.ReflectPermission "suppressAccessChecks";
  permission java.lang.RuntimePermission "setContextClassLoader";

  // azure client set Authenticator for proxy username/password
  permission java.net.NetPermission "setDefaultAuthenticator";
};
dgilling commented 1 year ago

@kotwanikunal Interestingly, I added the permission but confirmed it doesn't work. This is on latest 2.7 snapshot

grant {
  // azure client opens socket connections for to access repository
  permission org.opensearch.SpecialPermission "*";
  permission java.net.SocketPermission "*", "connect,resolve";
  permission java.lang.RuntimePermission "setFactory";
  permission java.net.NetPermission "getProxySelector";
  permission java.lang.RuntimePermission "accessDeclaredMembers";
  permission java.lang.reflect.ReflectPermission "suppressAccessChecks";
  permission java.lang.RuntimePermission "setContextClassLoader";

  // azure client set Authenticator for proxy username/password
  permission java.net.NetPermission "setDefaultAuthenticator";
};

@kotwanikunal , my bad. Had a syntax error in my permission file. Works once fixed.

dgilling commented 1 year ago

@kotwanikunal @saratvemulapalli , Even after fixing syntax, it's still not working. Same error:

                    "java.base/java.security.AccessControlContext.checkPermission(AccessControlContext.java:485)",
                    "java.base/java.security.AccessController.checkPermission(AccessController.java:1068)",
                    "java.base/java.lang.SecurityManager.checkPermission(SecurityManager.java:416)",
                    "org.opensearch.SpecialPermission.check(SpecialPermission.java:104)",
                    "org.opensearch.repositories.azure.SocketAccess.doPrivilegedException(SocketAccess.java:68)",
                    "org.opensearch.repositories.azure.AzureBlobStore.getInputStream(AzureBlobStore.java:282)",
                    "org.opensearch.repositories.azure.AzureBlobContainer.openInputStream(AzureBlobContainer.java:104)",
                    "org.opensearch.repositories.azure.AzureBlobContainer.readBlob(AzureBlobContainer.java:122)",
                    "org.opensearch.index.store.remote.utils.TransferManager.downloadBlockLocally(TransferManager.java:102)",
                    "org.opensearch.index.store.remote.utils.TransferManager.fetchOriginBlob(TransferManager.java:88)",
                    "org.opensearch.index.store.remote.utils.TransferManager.lambda$fetchBlob$0(TransferManager.java:54)",
                    "org.opensearch.index.store.remote.utils.ConcurrentInvocationLinearizer.linearizeInternal(ConcurrentInvocationLinearizer.java:70)",
                    "org.opensearch.index.store.remote.utils.ConcurrentInvocationLinearizer.linearize(ConcurrentInvocationLinearizer.java:46)",
                    "org.opensearch.index.store.remote.utils.TransferManager.fetchBlob(TransferManager.java:52)",
                    "org.opensearch.index.store.remote.file.OnDemandBlockSnapshotIndexInput.fetchBlock(OnDemandBlockSnapshotIndexInput.java:148)",
                    "org.opensearch.index.store.remote.file.OnDemandBlockIndexInput.demandBlock(OnDemandBlockIndexInput.java:340)",
                    "org.opensearch.index.store.remote.file.OnDemandBlockIndexInput.seekInternal(OnDemandBlockIndexInput.java:311)",
                    "org.opensearch.index.store.remote.file.OnDemandBlockIndexInput.seek(OnDemandBlockIndexInput.java:209)",
                    "org.opensearch.index.store.remote.file.OnDemandBlockSnapshotIndexInput.seek(OnDemandBlockSnapshotIndexInput.java:28)",
                    "org.apache.lucene.codecs.lucene90.IndexedDISI.advanceBlock(IndexedDISI.java:484)",
saratvemulapalli commented 1 year ago

@dgilling are there steps I could follow to reproduce the problem?

dgilling commented 1 year ago

@saratvemulapalli yes:

Steps taken:

  1. Added permission org.opensearch.SpecialPermission; to plugins/repository-azure/plugin-security.policy similar to the permissions for core OS here
  2. Restarted nodes.
  3. Restore a index to be remote searchable.
  4. Do an aggregation like below using painless.
  "my_agg": {
      "sum": {
          "script": {
              "inline": "(doc['field_a'].empty ? 0 : doc['field_a'].value) * (doc['field_b'].empty ? 1 : doc['field_b'].value)"
          }
      }
  }

Some additional context:

We have two identical clusters with same cluster settings, image, and data mapping other than size of data/machines. On the tiny dev cluster, we don't hit the security exception. On a larger staging cluster, we do hit it. In addition, when I enabled trace logs for org.opensearch.repositories and org.opensearch.index.store.remote, the small cluster is not printing readBlob but the large one does. My thinking is small one hit the cache only and so didn't hit the condition. However, I'm not too familiar with that code and could be wrong.

[2023-03-25T18:09:17,146][TRACE][o.o.r.a.AzureBlobContainer] [d10b89b39b03] readBlob(__5hjiSNCSTPS-FECemg.part0) from position [4051697664] with length [8388608]
[2023-03-25T18:09:17,146][TRACE][o.o.r.a.AzureBlobStore   ] [d10b89b39b03] reading container [opensearch-snapshots], blob [indices/0ID0aRZ4T-evvWREWFBI/0/__5hjiSNCSTPS-FECemgDpart0]
[2023-03-25T18:09:17,146][TRACE][o.o.r.a.AzureBlobContainer] [d10b89b39b03] readBlob(__Tm3qt-zHTVmCk858tU.part0) from position [4034920448] with length [8388608]
[2023-03-25T18:09:17,147][TRACE][o.o.r.a.AzureBlobStore   ] [d10b89b39b03] reading container [opensearch-snapshots], blob [indices/0ID0aRZ4T-evvWREWFBI/3/__Tm3qt-zHTVmCk858tU.part0]
[2023-03-25T18:09:17,148][TRACE][o.o.r.a.AzureBlobContainer] [d10b89b39b03] readBlob(___qkclJ3-RhikTZ7vQNif.part0) from position [4026531840] with length [8388608]
[2023-03-25T18:09:17,148][TRACE][o.o.r.a.AzureBlobStore   ] [d10b89b39b03] reading container [opensearch-snapshots], blob [indices/0ID0aRZ4T-evvWREWFBI/7/___qkclJ3-RhikTZ7vQNif.part0]

We are using the latest 2.7.0 snapshot from opensearchstaging docker hub. However, I should note that it doesn't include this latest PR which does change some logic around for TransferManager which is why my stack trace includes fetchOriginBlob

Manifest.yml:

---
schema-version: '1.1'
build:
  name: OpenSearch
  version: 2.7.0
  platform: linux
  architecture: x64
  distribution: tar
  location: https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.7.0/7279/linux/x64/tar/dist/opensearch/opensearch-2.7.0-linux-x64.tar.gz
  id: '7279'
components:
  - name: OpenSearch
    repository: https://github.com/opensearch-project/OpenSearch.git
    ref: 2.x
    commit_id: b392d2ab5cd6f97d621d89cb2bacbcfb35ee46eb
    location: https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.7.0/7279/linux/x64/tar/builds/opensearch/dist/opensearch-min-2.7.0-linux-x64.tar.gz
  - name: common-utils
    repository: https://github.com/opensearch-project/common-utils.git
    ref: 2.x
    commit_id: 95d168bc340521253b17e6f2fd3bd60332323e32
  - name: job-scheduler
    repository: https://github.com/opensearch-project/job-scheduler.git
    ref: 2.x
    commit_id: 5081933bbfca9b0273a09411fb2d28f334fd09f3
    location: https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.7.0/7279/linux/x64/tar/builds/opensearch/plugins/opensearch-job-scheduler-2.7.0.0.zip
  - name: k-NN
    repository: https://github.com/opensearch-project/k-NN.git
    ref: 2.x
    commit_id: 582369f81346bab48e121096b9d1b0bdebad9fa7
    location: https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.7.0/7279/linux/x64/tar/builds/opensearch/plugins/opensearch-knn-2.7.0.0.zip
  - name: geospatial
    repository: https://github.com/opensearch-project/geospatial.git
    ref: 2.x
    commit_id: e2f4af418e9e772aa694d2552c25d4f9020f51a6
    location: https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.7.0/7279/linux/x64/tar/builds/opensearch/plugins/opensearch-geospatial-2.7.0.0.zip
  - name: security
    repository: https://github.com/opensearch-project/security.git
    ref: 2.x
    commit_id: 17596f750cf0f6d1b35e4b9870218b180d4f8ede
    location: https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.7.0/7279/linux/x64/tar/builds/opensearch/plugins/opensearch-security-2.7.0.0.zip
  - name: cross-cluster-replication
    repository: https://github.com/opensearch-project/cross-cluster-replication.git
    ref: 2.x
    commit_id: 72082ad4637005f464b54c7f86d84d90f9f3a847
    location: https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.7.0/7279/linux/x64/tar/builds/opensearch/plugins/opensearch-cross-cluster-replication-2.7.0.0.zip
  - name: ml-commons
    repository: https://github.com/opensearch-project/ml-commons.git
    ref: 2.x
    commit_id: 021c4159c6fe0aee291dbcf7185c3c09ab99c36f
    location: https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.7.0/7279/linux/x64/tar/builds/opensearch/plugins/opensearch-ml-2.7.0.0.zip
  - name: neural-search
    repository: https://github.com/opensearch-project/neural-search.git
    ref: 2.x
    commit_id: 081e42d0022991b1b93e8787ddee7484c00b822b
    location: https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.7.0/7279/linux/x64/tar/builds/opensearch/plugins/opensearch-neural-search-2.7.0.0.zip
  - name: sql
    repository: https://github.com/opensearch-project/sql.git
    ref: 2.x
    commit_id: 11d0d6dd7b0dac06ba2a5918ffd4d169501b004d
    location: https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/2.7.0/7279/linux/x64/tar/builds/opensearch/plugins/opensearch-sql-2.7.0.0.zip
andrross commented 1 year ago

My thinking is small one hit the cache only and so didn't hit the condition.

That's almost certainly the case. A small dataset will quickly be pulled to disk and no need to evict anything, so its likely by the time it gets to the painless script the data is already local.

dgilling commented 1 year ago

@andrross makes sense. I should also note I saw data was fetched from remote repository for the small indices right after snapshot restore cmd. So it was already in file cache before first search query (vs being on-demand). Is that expected?

@saratvemulapalli let me know if something else I should try besides adding the security permission.

dgilling commented 1 year ago

Thanks @kotwanikunal . We tested #6914 and works.

kotwanikunal commented 1 year ago

There was an additional fix which was merged in with #6927 which covers another case where you can encounter the error. Please reopen the issue if you encounter it again even after both these fixes.

dgilling commented 1 year ago

Much appreciated @kotwanikunal for fixing. We upgraded the large staging cluster with both #6927 and #6917 patches and so far so good :). We'll monitor and follow up if we see an issue.

Unrelated, but curious if there is an issue with the jenkins job publishing to Docker Hub? Seems like the last few builds didnt push to opensearchstaging/opensearch:2.7.0 so we had to build from source.