trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.41k stars 3k forks source link

Iceberg REST Catalog: Support for vended credentials for Azure #23238

Open c-thiel opened 2 months ago

c-thiel commented 2 months ago

I am currently extending the integration tests for our iceberg rest catalog implementation.

S3 integration with trino works nicely, but I can't get vended-credentails up and running.

My configuration looks as follows:

    CREATE CATALOG test_azure USING iceberg
    WITH (
        "iceberg.catalog.type" = 'rest',
        "iceberg.rest-catalog.uri" = 'https://api.tabular.io/ws/',
        "iceberg.rest-catalog.warehouse" = 'cth-azure',
        "iceberg.rest-catalog.security" = 'OAUTH2',
        "iceberg.rest-catalog.oauth2.token" = '{token}',
        "iceberg.rest-catalog.vended-credentials-enabled" = 'true',
        "fs.native-azure.enabled" = 'true'
    )

I can use all endpoints as usual, but data operations fail with the following error:

TrinoExternalError(type=EXTERNAL, name=ICEBERG_FILESYSTEM_ERROR, message="Failed checking new table's location: abfss://<filesystem>@<storage-account-name>.dfs.core.windows.net/0191b3ad-6c11-7fc2-bb2a-12a7648dc115/my_table-a90cff2771dd44d3aa55235cf4f77a47", query_id=20240902_165932_00260_3zngw)

I believe the returned config attribute for the table: adls.sas-token.<storage-account-name>.dfs.core.windows.net: "skoid=...." is not being used. Instead it is trying to load credentials from well-known locations:

2024-09-02T16:59:32.697Z        INFO    Query-20240902_165932_00258_3zngw-817   com.azure.identity.ChainedTokenCredential       Azure Identity => Attempted credential EnvironmentCredential is unavailable.
2024-09-02T16:59:32.697Z        INFO    Query-20240902_165932_00258_3zngw-817   com.azure.identity.ChainedTokenCredential       Azure Identity => Attempted credential WorkloadIdentityCredential is unavailable.
2024-09-02T16:59:32.698Z        WARN    ForkJoinPool.commonPool-worker-15       com.microsoft.aad.msal4j.ConfidentialClientApplication  [Correlation ID: aac0fdfd-a2cf-4916-8da2-704b4a23630c] Execution of class com.microsoft.aad.msal4j.AcquireTokenByClientCredentialSupplier failed: java.util.concurrent.ExecutionException: com.azure.identity.CredentialUnavailableException: ManagedIdentityCredential authentication unavailable. Connection to IMDS endpoint cannot be established, Connection refused.
2024-09-02T16:59:32.698Z        INFO    ForkJoinPool.commonPool-worker-15       com.azure.identity.ChainedTokenCredential       Azure Identity => Attempted credential ManagedIdentityCredential is unavailable.
2024-09-02T16:59:32.698Z        INFO    ForkJoinPool.commonPool-worker-16       com.azure.identity.ChainedTokenCredential       Azure Identity => Attempted credential SharedTokenCacheCredential is unavailable.
2024-09-02T16:59:32.698Z        INFO    ForkJoinPool.commonPool-worker-16       com.azure.identity.ChainedTokenCredential       Azure Identity => Attempted credential IntelliJCredential is unavailable.
2024-09-02T16:59:32.701Z        INFO    ForkJoinPool.commonPool-worker-16       com.azure.identity.ChainedTokenCredential       Azure Identity => Attempted credential AzureCliCredential is unavailable.
2024-09-02T16:59:32.705Z        INFO    ForkJoinPool.commonPool-worker-16       com.azure.identity.ChainedTokenCredential       Azure Identity => Attempted credential AzurePowerShellCredential is unavailable.
2024-09-02T16:59:32.706Z        INFO    ForkJoinPool.commonPool-worker-16       com.azure.identity.ChainedTokenCredential       Azure Identity => Attempted credential AzureDeveloperCliCredential is unavailable.
2024-09-02T16:59:32.707Z        ERROR   ForkJoinPool.commonPool-worker-16       com.azure.core.implementation.AccessTokenCache  {"az.sdk.message":"Failed to acquire a new access token.","exception":"EnvironmentCredential authentication unavailable. Environment variables are not fully configured.To mitigate this issue, please refer to the troubleshooting guidelines here at https://aka.ms/azsdk/java/identity/environmentcredential/troubleshoot\r\nWorkloadIdentityCredential authentication unavailable. The workload options are not fully configured. See the troubleshooting guide for more information. https://aka.ms/azsdk/java/identity/workloadidentitycredential/troubleshoot\r\nManaged Identity authentication is not available.\r\nSharedTokenCacheCredential authentication unavailable. No accounts were found in the cache.\r\nIntelliJ Authentication not available. Please log in with Azure Tools for IntelliJ plugin in the IDE. Fore more details refer to the troubleshooting guidelines here at https://aka.ms/azsdk/java/identity/intellijcredential/troubleshoot\r\nAzureCliCredential authentication unavailable. Azure CLI not installed.To mitigate this issue, please refer to the troubleshooting guidelines here at https://aka.ms/azsdk/java/identity/azclicredential/troubleshoot\r\nEncountered error when deserializing response from Azure Power Shell.\r\nAzureDeveloperCliCredential authentication unavailable. Azure Developer CLI not installed.To mitigate this issue, please refer to the troubleshooting guidelines here at https://aka.ms/azsdk/java/identity/azdevclicredential/troubleshootTo mitigate this issue, please refer to the troubleshooting guidelines here at https://aka.ms/azure-identity-java-default-azure-credential-troubleshoot"}
2024-09-02T16:59:32.707Z        INFO    dispatcher-query-86     io.trino.event.QueryMonitor     TIMELINE: Query 20240902_165932_00258_3zngw :: FAILED (ICEBERG_FILESYSTEM_ERROR) :: elapsed 601ms :: planning 0ms :: waiting 0ms :: scheduling 601ms :: running 0ms :: finishing 601ms :: begin 2024-09-02T16:59:32.105Z :: end 2024-09-02T16:59:32.706Z

It would be great if someone could check why the returned sas token is not being used. The java iceberg package supports it and spark respects the token as well.

Let me know if I can support this in any way!

mayankvadariya commented 1 month ago

@c-thiel it appears that vended credential support was added only for S3 https://github.com/trinodb/trino/pull/20186/files#diff-300020eb68920cea152529486b75fddcf41b4fa85c10be3ce893a17d4537881bR105-R106

Please feel free to create a PR for Azure.

mayankvadariya commented 1 month ago

@cgpoh https://github.com/apache/polaris/pull/44#issuecomment-2323108039 can you please advise if you were successfully be able to use vended credentials with Azure?

@c-thiel you may want to try the above given config to see if it works.

cgpoh commented 1 month ago

@mayankvadariya my company policy disable SAS token generation, therefore, vended credentials is not working for me. I need to add these properties in order to get Trino working with Polaris:

fs.native-azure.enabled=true
azure.auth-type=OAUTH
azure.oauth.tenant-id=tenant-id
azure.oauth.endpoint=https://login.microsoftonline.com/tenant-id/oauth2/token
azure.oauth.client-id=client-id
azure.oauth.secret=client-secret

With the above properties, we can remove iceberg.rest-catalog.vended-credentials-enabled property or set to false.

mayankvadariya commented 1 month ago

Thanks for confirming @cgpoh and updating https://github.com/apache/polaris/pull/44#issuecomment-2323108039

@c-thiel as stated, please feel free to create a PR for this feature. Thanks.

c-thiel commented 1 month ago

@mayankvadariya I am not a Java dev - more involved with Rust and Python. I probably won't be able to work on this soon.