opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
8.89k stars 1.63k forks source link

Main repository azure managed identity support #12559

Closed chengwushi-netapp closed 2 weeks ago

chengwushi-netapp commented 2 months ago

Description

This PR added supported for managed identity in the repository-azure plugin.

Related Issues

Resolves #12423

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions[bot] commented 2 months ago

:x: Gradle check result for 9d0d5658a610a7909bca5dad241cb22620ab519a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] commented 2 months ago

:x: Gradle check result for a330e99bcafe709a0a4d49a637424f93d223c629: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] commented 2 months ago

Compatibility status:

Checks if related components are compatible with change 3cb3c7e

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/sql.git]

chengwushi-netapp commented 2 months ago

This PR is expected to fail the Task :plugins:repository-azure:thirdPartyAudit because I am uncertain about the best approach to resolve this failure.

From my understanding, the thirdPartyAudit requires all dependencies to be explicitly declared. For instance, I have the following dependency tree from using com.azure:azure-identity:1.11.2.

com.azure:azure-identity:1.11.2 ----net.java.dev.jna:jna-platform:5.14.0 ----com.microsoft.azure:msal4j-persistence-extension:1.2.0 ----com.microsoft.azure:msal4j:1.14.2 --------com.nimbusds:oauth2-oidc-sdk:11.10 ------------com.nimbusds:nimbus-jose-jwt:9.37.3 ------------com.nimbusds:content-type:2.3 ------------com.nimbusds:lang-tag:1.7 ----net.minidev:json-smart:2.5.0 --------net.minidev:accessors-smart:2.5.0 ------------org.ow2.asm:asm:9.6

Note: For simplicity, I have not listed all the nested dependencies, as it would make the tree too large for demonstration purposes.

I believe to pass the Task :plugins:repository-azure:thirdPartyAudit check, I would need to declare all these nested dependencies, not just a subset of them.

Therefore, I have the following questions, and I would greatly appreciate it if anyone could provide answers:

  1. Do i need to include all the nested dependencies? 1.1 If the answer is yes, wouldn't the list of dependencies become too large? What happens if one of them contains vulnerabilities? 1.2 If the answer is no, how do I determine which dependencies I can ignore?
  2. Is there a way to automate the addition of nested dependencies to the build file? I am currently adding them manually by referencing mavenCentral.
  3. Are there any best practices or recommended approaches when dealing with complex dependency trees when adding new dependencies in OpenSearch?
  4. I am aware of the ignoreMissingClasses in the thirdPartyAudit check. How do we determine if we can ignore a missing class found in the thirdPartyAudit check?
  5. When adding a new dependency, I was under the impression that Gradle would automatically fetch its nested dependencies. However, in my experience while testing this PR on an Azure Virtual Machine with an attached managed identity, I found that I had to manually install three nested dependencies, even though I had already included com.azure:azure-identity:1.11.2. The dependencies I had to manually add were com.microsoft.azure:msal4j:1.14.2, com.nimbusds:oauth2-oidc-sdk:11.10, and net.minidev:json-smart:2.5.0. Could anyone clarify why these dependencies weren't automatically fetched by Gradle?
chengwushi-netapp commented 2 months ago

@andrross @kotwanikunal @msfroh Hello reviewers, I am reaching out to kindly request your assistance in reviewing this pull request I've recently submitted to enable support for managed identity in the repository-azure plugin. If you could spare a moment to provide feedback, I would be deeply grateful. Many thanks for your time and consideration. Cheers.

andrross commented 2 months ago

@AmiStrn Can you help out here with reviewing the Azure managed identity parts of the PR?

chengwushi-netapp commented 2 months ago

Hello @dblock @andrross @AmiStrn and reviewers,

I've put together a concise design document that encapsulates the implementation details of this Pull Request. If you're unfamiliar with the repository-azure plugin, I believe this document could provide some useful insights.

The questions I've previously posed regarding dependencies still stand, and I would sincerely appreciate your insights on them. Your expertise and guidance are invaluable to me, and I look forward to your feedback.

Thank you for your time and consideration.

Managed-Identity-Support-Design-Doc-GitHub.docx

andrross commented 2 months ago

Thanks @chengwushi-netapp. My (possibly incorrect) understanding of how this works is that you need to explicitly list all transitive dependencies that you need to be bundled with your plugin in order to function. The third party audit will flag any classes that cannot be loaded due to missing dependencies. If you know that OpenSearch will never need to use the classes that fail, then you can add them to the ignoreMissingClasses list, otherwise you need to list the transitive dependencies that are needed in order to fix the classloader errors.

github-actions[bot] commented 1 month ago

:x: Gradle check result for 59e650c36aea9d793ab4326e5493de2fbd66f678: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

andrross commented 1 month ago

@saratvemulapalli Is this the correct way to add a dependency to a plugin? Certainly a lot of hoops to jump through but it appears that this is how it works.

@chengwushi-netapp can you also rebase from the latest commit on the upstream main branch?

chengwushi-netapp commented 1 month ago

@saratvemulapalli Is this the correct way to add a dependency to a plugin? Certainly a lot of hoops to jump through but it appears that this is how it works.

@chengwushi-netapp can you also rebase from the latest commit on the upstream main branch?

Hello @andrross, thank you for your valuable feedback. I have included all the necessary transitive dependencies and ignored the optional compile dependencies. Additionally, I have rebased my PR from the latest commit on the upstream main branch. Your guidance is greatly appreciated.

github-actions[bot] commented 1 month ago

:white_check_mark: Gradle check result for 2c0ced18e68cbeafcaa23062c89f9b0071a52866: SUCCESS

codecov[bot] commented 1 month ago

Codecov Report

Attention: Patch coverage is 89.04110% with 8 lines in your changes are missing coverage. Please review.

Project coverage is 71.49%. Comparing base (b15cb0c) to head (280dd0a). Report is 280 commits behind head on main.

Files Patch % Lines
...search/repositories/azure/AzureStorageService.java 73.07% 5 Missing and 2 partials :warning:
...earch/repositories/azure/AzureStorageSettings.java 97.14% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #12559 +/- ## ============================================ + Coverage 71.42% 71.49% +0.07% - Complexity 59978 61102 +1124 ============================================ Files 4985 5059 +74 Lines 282275 287521 +5246 Branches 40946 41646 +700 ============================================ + Hits 201603 205576 +3973 - Misses 63999 64926 +927 - Partials 16673 17019 +346 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

github-actions[bot] commented 1 month ago

:grey_exclamation: Gradle check result for 6afcae2e0d6ba63c3d02941b75d91c309ad9f726: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

saratvemulapalli commented 1 month ago

@saratvemulapalli Is this the correct way to add a dependency to a plugin? Certainly a lot of hoops to jump through but it appears that this is how it works.

@chengwushi-netapp can you also rebase from the latest commit on the upstream main branch?

@andrross Its not any different compared to the server module. I presume you are referring to 3rd Party Audit? Compile time dependencies (transitive dependencies) are automatically pulled by gradle.

github-actions[bot] commented 1 month ago

:x: Gradle check result for 77513abc77aaaff9f4044ba351dc53170e2827c4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] commented 1 month ago

:x: Gradle check result for fd1adb43b2038d582cc8ba9227ed71654ebc7112: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] commented 1 month ago

:x: Gradle check result for ec412e3f357dd0970e09a5abae62150010dd7d89: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] commented 1 month ago

:white_check_mark: Gradle check result for 3cb3c7e2f2c2e1c3bb06355e5ccdf9e01f59ec7e: SUCCESS

reta commented 1 month ago

@reta do you know if we document our native plugins ?

Yes, we do! (if by native you mean ones we bundle with core)

saratvemulapalli commented 1 month ago

@reta do you know if we document our native plugins ?

Yes, we do! (if by native you mean ones we bundle with core)

Yeah. I believe its worth it to update the documentation for managed identity.

github-actions[bot] commented 1 month ago

:grey_exclamation: Gradle check result for 67b2bfe6a3e909ae2d1c8ef419d0461543bcb7a9: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

github-actions[bot] commented 1 month ago

:grey_exclamation: Gradle check result for bd20931b74df058d8e6e965dbe425c88dce40300: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

github-actions[bot] commented 1 month ago

:x: Gradle check result for 6a781171106dc175266b903e1e398c8294e13894: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] commented 1 month ago

:x: Gradle check result for 0e921c4b3cf52f36ef530921a11f172a68a6e739: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

chengwushi-netapp commented 1 month ago

Hello @reta @saratvemulapalli @andrross and @dblock. I was performing testings when backporting the changes in this PR to opensearch version 2.12.0. And I have encountered an error java.security.AccessControlException: access denied ("java.net.SocketPermission" "169.254.169.254:80" "connect,resolve") when authenticating to blob storage from the azure virtual machine using managed identity. Interestingly, this error is not occurring in opensearch version 2.11.1.

I am suspecting the issue might be with java security policy, and here are my investigations

Therefore, I am concluding that the movement of the reactor-core dependency from the repository-azure plugin to the main opensearch is the root cause of this issue. The reason is that the main opensearch is using a more restrictive set of security policy, and by moving the reactor-core dependency to the main opensearch, the security policy from repository-azure plugin is no longer applied to the reactor-core dependency. Thus, raised the error java.security.AccessControlException: access denied ("java.net.SocketPermission" "169.254.169.254:80" "connect,resolve").

I have added a new commit to fix this error. Do let me know if this is the correct solution or not, and your guidance to investigate this further would be greatly appreciated.

Please find below the full error log for your references.

Apr 16 03:56:42 vm624001dbc opensearch[2193]: [2024-04-16T03:56:42,864][WARN ][r.suppressed             ] [vm624001dbc] path: /_snapshot/es-fee206bf-419e-41bf-be85-9592f898a128-snapshot-repo, params: {master_timeout=30s, repository=es-fee206bf-419e-41bf-be85-9592f898a128-snapshot-repo, timeout=30s}
Apr 16 03:56:42 vm624001dbc opensearch[2193]: org.opensearch.repositories.RepositoryVerificationException: [es-fee206bf-419e-41bf-be85-9592f898a128-snapshot-repo] path [91bf0e42-83aa-4aaa-8081-ba949cbebdcc] is not accessible on cluster-manager node
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at org.opensearch.repositories.blobstore.BlobStoreRepository.startVerification(BlobStoreRepository.java:1973) ~[opensearch-2.12.0.jar:2.12.0]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at org.opensearch.repositories.RepositoriesService$3.doRun(RepositoriesService.java:373) ~[opensearch-2.12.0.jar:2.12.0]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:913) [opensearch-2.12.0.jar:2.12.0]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.12.0.jar:2.12.0]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]: Caused by: com.microsoft.aad.msal4j.MsalAzureSDKException: java.util.concurrent.ExecutionException: com.azure.identity.CredentialUnavailableException: ManagedIdentityCredential authentication unavailable. Connection to IMDS endpoint cannot be established, access denied ("java.net.SocketPermission" "169.254.169.254:80" "connect,resolve").
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at com.microsoft.aad.msal4j.AcquireTokenByAppProviderSupplier.fetchTokenUsingAppTokenProvider(AcquireTokenByAppProviderSupplier.java:79) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at com.microsoft.aad.msal4j.AcquireTokenByAppProviderSupplier.execute(AcquireTokenByAppProviderSupplier.java:56) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at com.microsoft.aad.msal4j.AcquireTokenByClientCredentialSupplier.acquireTokenByClientCredential(AcquireTokenByClientCredentialSupplier.java:78) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at com.microsoft.aad.msal4j.AcquireTokenByClientCredentialSupplier.execute(AcquireTokenByClientCredentialSupplier.java:49) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at com.microsoft.aad.msal4j.AuthenticationResultSupplier.get(AuthenticationResultSupplier.java:69) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at com.microsoft.aad.msal4j.AuthenticationResultSupplier.get(AuthenticationResultSupplier.java:18) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         ... 1 more
Apr 16 03:56:42 vm624001dbc opensearch[2193]: Caused by: java.util.concurrent.ExecutionException: com.azure.identity.CredentialUnavailableException: ManagedIdentityCredential authentication unavailable. Connection to IMDS endpoint cannot be established, access denied ("java.net.SocketPermission" "169.254.169.254:80" "connect,resolve").
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at com.microsoft.aad.msal4j.AcquireTokenByAppProviderSupplier.fetchTokenUsingAppTokenProvider(AcquireTokenByAppProviderSupplier.java:76) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at com.microsoft.aad.msal4j.AcquireTokenByAppProviderSupplier.execute(AcquireTokenByAppProviderSupplier.java:56) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at com.microsoft.aad.msal4j.AcquireTokenByClientCredentialSupplier.acquireTokenByClientCredential(AcquireTokenByClientCredentialSupplier.java:78) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at com.microsoft.aad.msal4j.AcquireTokenByClientCredentialSupplier.execute(AcquireTokenByClientCredentialSupplier.java:49) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at com.microsoft.aad.msal4j.AuthenticationResultSupplier.get(AuthenticationResultSupplier.java:69) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at com.microsoft.aad.msal4j.AuthenticationResultSupplier.get(AuthenticationResultSupplier.java:18) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         ... 1 more
Apr 16 03:56:42 vm624001dbc opensearch[2193]: Caused by: com.azure.identity.CredentialUnavailableException: ManagedIdentityCredential authentication unavailable. Connection to IMDS endpoint cannot be established, access denied ("java.net.SocketPermission" "169.254.169.254:80" "connect,resolve").
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at com.azure.identity.implementation.IdentityClient.lambda$checkIMDSAvailable$63(IdentityClient.java:1270) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at reactor.core.publisher.MonoCallable.call(MonoCallable.java:72) ~[reactor-core-3.5.14.jar:3.5.14]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at reactor.core.publisher.FluxFlatMap.trySubscribeScalarMap(FluxFlatMap.java:127) ~[reactor-core-3.5.14.jar:3.5.14]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at reactor.core.publisher.MonoFlatMap.subscribeOrReturn(MonoFlatMap.java:53) ~[reactor-core-3.5.14.jar:3.5.14]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at reactor.core.publisher.Mono.subscribe(Mono.java:4480) ~[reactor-core-3.5.14.jar:3.5.14]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at reactor.core.publisher.Mono.subscribeWith(Mono.java:4561) ~[reactor-core-3.5.14.jar:3.5.14]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at reactor.core.publisher.Mono.toFuture(Mono.java:5073) ~[reactor-core-3.5.14.jar:3.5.14]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at com.azure.identity.implementation.IdentityClientBase.lambda$getManagedIdentityConfidentialClient$3(IdentityClientBase.java:419) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at com.microsoft.aad.msal4j.AcquireTokenByAppProviderSupplier.fetchTokenUsingAppTokenProvider(AcquireTokenByAppProviderSupplier.java:75) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at com.microsoft.aad.msal4j.AcquireTokenByAppProviderSupplier.execute(AcquireTokenByAppProviderSupplier.java:56) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at com.microsoft.aad.msal4j.AcquireTokenByClientCredentialSupplier.acquireTokenByClientCredential(AcquireTokenByClientCredentialSupplier.java:78) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at com.microsoft.aad.msal4j.AcquireTokenByClientCredentialSupplier.execute(AcquireTokenByClientCredentialSupplier.java:49) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at com.microsoft.aad.msal4j.AuthenticationResultSupplier.get(AuthenticationResultSupplier.java:69) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at com.microsoft.aad.msal4j.AuthenticationResultSupplier.get(AuthenticationResultSupplier.java:18) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         ... 1 more
Apr 16 03:56:42 vm624001dbc opensearch[2193]: Caused by: java.security.AccessControlException: access denied ("java.net.SocketPermission" "169.254.169.254:80" "connect,resolve")
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at java.base/java.security.AccessControlContext.checkPermission(AccessControlContext.java:488) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at java.base/java.security.AccessController.checkPermission(AccessController.java:1071) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at java.base/java.lang.SecurityManager.checkPermission(SecurityManager.java:411) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at java.base/java.lang.SecurityManager.checkConnect(SecurityManager.java:905) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at java.base/sun.net.www.http.HttpClient.openServer(HttpClient.java:619) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at java.base/sun.net.www.http.HttpClient.<init>(HttpClient.java:280) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:386) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at java.base/sun.net.www.http.HttpClient.New(HttpClient.java:408) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at java.base/sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1304) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1237) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at java.base/sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1123) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at java.base/sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:1052) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at com.azure.identity.implementation.IdentityClient.lambda$checkIMDSAvailable$63(IdentityClient.java:1264) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at reactor.core.publisher.MonoCallable.call(MonoCallable.java:72) ~[reactor-core-3.5.14.jar:3.5.14]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at reactor.core.publisher.FluxFlatMap.trySubscribeScalarMap(FluxFlatMap.java:127) ~[reactor-core-3.5.14.jar:3.5.14]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at reactor.core.publisher.MonoFlatMap.subscribeOrReturn(MonoFlatMap.java:53) ~[reactor-core-3.5.14.jar:3.5.14]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at reactor.core.publisher.Mono.subscribe(Mono.java:4480) ~[reactor-core-3.5.14.jar:3.5.14]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at reactor.core.publisher.Mono.subscribeWith(Mono.java:4561) ~[reactor-core-3.5.14.jar:3.5.14]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at reactor.core.publisher.Mono.toFuture(Mono.java:5073) ~[reactor-core-3.5.14.jar:3.5.14]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at com.azure.identity.implementation.IdentityClientBase.lambda$getManagedIdentityConfidentialClient$3(IdentityClientBase.java:419) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at com.microsoft.aad.msal4j.AcquireTokenByAppProviderSupplier.fetchTokenUsingAppTokenProvider(AcquireTokenByAppProviderSupplier.java:75) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at com.microsoft.aad.msal4j.AcquireTokenByAppProviderSupplier.execute(AcquireTokenByAppProviderSupplier.java:56) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at com.microsoft.aad.msal4j.AcquireTokenByClientCredentialSupplier.acquireTokenByClientCredential(AcquireTokenByClientCredentialSupplier.java:78) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at com.microsoft.aad.msal4j.AcquireTokenByClientCredentialSupplier.execute(AcquireTokenByClientCredentialSupplier.java:49) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at com.microsoft.aad.msal4j.AuthenticationResultSupplier.get(AuthenticationResultSupplier.java:69) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at com.microsoft.aad.msal4j.AuthenticationResultSupplier.get(AuthenticationResultSupplier.java:18) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768) ~[?:?]
Apr 16 03:56:42 vm624001dbc opensearch[2193]:         ... 1 more
github-actions[bot] commented 1 month ago

:white_check_mark: Gradle check result for c3c090a8bcc0152ca2f9c1d5e7fbeaf2bb07d895: SUCCESS

reta commented 1 month ago

Please find below the full error log for your references.

@chengwushi-netapp thanks a lot for running such test, I think I know where is the problem (it is not related to security policies) but I need to look at where (and how) to apply the fix. I have very limited availability this week so may not be able to find time till next week (sorry about that), but in general it seems like there is an execution happening in the thread that is not having the right security context (thread group).

chengwushi-netapp commented 1 month ago

Please find below the full error log for your references.

@chengwushi-netapp thanks a lot for running such test, I think I know where is the problem (it is not related to security policies) but I need to look at where (and how) to apply the fix. I have very limited availability this week so may not be able to find time till next week (sorry about that), but in general it seems like there is an execution happening in the thread that is not having the right security context (thread group).

Thank you @reta for the confirmation, I have removed the addition of grant codeBase "${codebase.reactor-core}" { permission java.net.SocketPermission "*", "connect,resolve";}, since security policies is not the root cause of this problem.

And I assuming the fix to this problem will be in a separate PR? @reta. I happy to include the fix in this PR if it's easier, however, i probably need your help with the fix.

Lastly, do let me know if there is any more feedbacks from this PR when you have time next week.

github-actions[bot] commented 1 month ago

:white_check_mark: Gradle check result for c676f6e59a1778a6303db2ab532938eb84088473: SUCCESS

github-actions[bot] commented 1 month ago

:x: Gradle check result for 890ab43ac809686d091e256f8a1c7b468624eecc: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] commented 1 month ago

:white_check_mark: Gradle check result for fc2541756c9570d13ff880b7b932351253fe65f2: SUCCESS

reta commented 1 month ago

And I assuming the fix to this problem will be in a separate PR? @reta. I happy to include the fix in this PR if it's easier, however, i probably need your help with the fix.

The plugin is unsuable without fixing the security permissions, we have to fix the issue before getting the code in. The problem is more complicated than I thought, could you please try this workaround in the AzureStorageSettings while I am working on getting Azure access to help you with troubleshooting the issue:

            this.clientBuilder = (builder) -> builder.credential(new ManagedIdentityCredentialBuilder() {
                @Override
                public ManagedIdentityCredential build() {
                    CredentialBuilderBaseHelper.getClientOptions(this).setExecutorService(OpenSearchExecutors.newDirectExecutorService());
                    return super.build();
                }
            }.build()).endpoint(getStorageEndpoint().getPrimaryUri());

Thank you.

chengwushi-netapp commented 1 month ago

And I assuming the fix to this problem will be in a separate PR? @reta. I happy to include the fix in this PR if it's easier, however, i probably need your help with the fix.

The plugin is unsuable without fixing the security permissions, we have to fix the issue before getting the code in. The problem is more complicated than I thought, could you please try this workaround in the AzureStorageSettings while I am working on getting Azure access to help you with troubleshooting the issue:

            this.clientBuilder = (builder) -> builder.credential(new ManagedIdentityCredentialBuilder() {
                @Override
                public ManagedIdentityCredential build() {
                    CredentialBuilderBaseHelper.getClientOptions(this).setExecutorService(OpenSearchExecutors.newDirectExecutorService());
                    return super.build();
                }
            }.build()).endpoint(getStorageEndpoint().getPrimaryUri());

Thank you.

Hello @reta, I have tried the workaround you provided, and I am seeing the same authentication error when hitting the endpoint /_snapshot/<snapshot-repo-name>/_all. Please find attached the error log.

azure-managed-identity-not-available-error-log.txt

github-actions[bot] commented 1 month ago

:white_check_mark: Gradle check result for 7b6acb6669ae219d3ee0ee564510f38735316f2d: SUCCESS

github-actions[bot] commented 1 month ago

:x: Gradle check result for 24b6c7a66fc799b2690042ed1f810f383314205b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

reta commented 1 month ago

Hello @reta, I have tried the workaround you provided, and I am seeing the same authentication error when hitting the endpoint /_snapshot/<snapshot-repo-name>/_all. Please find attached the error log.

Thanks @chengwushi-netapp , on it

reta commented 1 month ago

Hello @reta, I have tried the workaround you provided, and I am seeing the same authentication error when hitting the endpoint /_snapshot/<snapshot-repo-name>/_all. Please find attached the error log.

@chengwushi-netapp I found the problem (and the fix) but it is a bit complicated, would you mind if I push this particular change into your pull request? thank you.

hdhalter commented 1 month ago

Hi @chengwushi-netapp, will you be creating the doc PR for this change? Here is the doc issue: https://github.com/opensearch-project/documentation-website/issues/6874. Thanks!

chengwushi-netapp commented 1 month ago

Hello @reta, I have tried the workaround you provided, and I am seeing the same authentication error when hitting the endpoint /_snapshot/<snapshot-repo-name>/_all. Please find attached the error log.

@chengwushi-netapp I found the problem (and the fix) but it is a bit complicated, would you mind if I push this particular change into your pull request? thank you.

Hello @reta, of course you can, please feel free to make any changes that are required. I just rebased the branch in hope to fix the failing tests. Do remember to update our local branch. Cheers

chengwushi-netapp commented 1 month ago

Hi @chengwushi-netapp, will you be creating the doc PR for this change? Here is the doc issue: opensearch-project/documentation-website#6874. Thanks!

Hello @hdhalter, my apologise for overlooking your questions under that issue. And yes, I was planning to create the doc PR after I got approval from this PR. But looks like I have missed the Doc PR cut off which is yesterday? Let me quickly draft a doc PR today hoping I could still make it. Cheers

github-actions[bot] commented 1 month ago

:white_check_mark: Gradle check result for 9152c739de95dc71acb385e81f5dff6c47b5a8a2: SUCCESS

chengwushi-netapp commented 1 month ago

@reta do you know if we document our native plugins ?

Yes, we do! (if by native you mean ones we bundle with core)

Hello @reta, just wondering where could I find the documentation for repository-azure plugin? I did not managed to find any dedicated section for repository-azure plugin in the documentation-website repo.

reta commented 1 month ago

Hello @reta, just wondering where could I find the documentation for repository-azure plugin?

Hello @chengwushi-netapp , we sadly have gaps in documentation, see please https://github.com/opensearch-project/documentation-website/issues/417 :(

github-actions[bot] commented 1 month ago

:x: Gradle check result for d9c2939ea58c513e4c3ea228b7478d175b368b62: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] commented 1 month ago

:x: Gradle check result for be490690bf37566582ac2ddae1ef1517b9a9fc9f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] commented 1 month ago

:grey_exclamation: Gradle check result for 18cb593bf089f8c7de411e392c405a1af9909dfa: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

chengwushi-netapp commented 4 weeks ago

@reta You are awesome! Thank you for the fix and also refactored some of my code:)

Just wondering if there is any more action items needed from my end in order to get this PR approved?

Cheers, Chengwu.

reta commented 4 weeks ago

@reta You are awesome! Thank you for the fix and also refactored some of my code:)

Just wondering if there is any more action items needed from my end in order to get this PR approved?

Cheers, Chengwu.

Thanks @chengwushi-netapp , could you just confirm it works e2e from your side? I sadly didn't get my Azure access yet but I was able to reproduce the issue in simulated environment (to make sure it was fixed)

chengwushi-netapp commented 3 weeks ago

@reta You are awesome! Thank you for the fix and also refactored some of my code:) Just wondering if there is any more action items needed from my end in order to get this PR approved? Cheers, Chengwu.

Thanks @chengwushi-netapp , could you just confirm it works e2e from your side? I sadly didn't get my Azure access yet but I was able to reproduce the issue in simulated environment (to make sure it was fixed)

Hello @reta , I have tested the code on an azure vm with managed identity attached, and confirmed that it is working by hitting multiple api endpoints such as

  1. get /_snapshot/<snapshot-repo-name>/_all,
  2. put /_snapshot/<repo-name>/snapshot_1?wait_for_completion=true and
  3. delete /_snapshot/<repo-name>/<snapshot-name>.