trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.19k stars 2.94k forks source link

Does trino support Azure Managed Identity with ADLS Gen2? #23301

Closed fantasyljc closed 3 days ago

fantasyljc commented 1 week ago

We have an issue for connecting ADLS Gen2 using Azure Managed Identity.

### Tasks
fantasyljc commented 1 week ago

core-site.xml

core-site.xml

<configuration>
<property>
      <name>fs.defaultFS</name>
      <value>abfss://cdp@mysa01.dfs.core.chinacloudapi.cn/</value>
      <final>true</final>
    </property>

<property>
  <name>fs.azure.account.auth.type</name>
  <value>OAuth</value>
  <description>
  Use OAuth authentication
  </description>
</property>
<property>
  <name>fs.azure.account.oauth.provider.type</name>
  <value>org.apache.hadoop.fs.azurebfs.oauth2.MsiTokenProvider</value>
  <description>
  Use MSI for issuing OAuth tokens
  </description>
</property>
<property>
  <name>fs.azure.account.oauth2.msi.tenant</name>
  <value>xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx</value>
  <description>
  Optional MSI Tenant ID
  </description>
</property>
<property>
  <name>fs.azure.account.oauth2.msi.endpoint</name>
  <value>http://169.254.169.254/metadata/identity/oauth2/token</value>
  <description>
   MSI endpoint
  </description>
</property>
<property>
  <name>fs.azure.account.oauth2.client.id</name>
  <value>xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx</value>
  <description>
  Optional Client ID
  </description>
</property>

</configuration>

hive.properties

hive.metastore-refresh-interval=1m
connector.name=hive
hive.metastore-cache-ttl=20m
hive.non-managed-table-writes-enabled = true
hive.recursive-directories = true
hive.metastore-refresh-interval=10s
hive.metastore.uri=thrift://10.3.4.69:9083
hive.config.resources=/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml

error log

root@ops:/mnt# trino-cli --server=10.3.1.4:8080 --catalog=hive --schema=default
trino:default> select * from default.test01;
Query 20240905_130215_00001_2wyp4 failed: Failed to list directory: abfss://cdp@mysa01.dfs.core.chinacloudapi.cn/tmp/test01
2024-09-05T13:02:16.906Z    INFO    dispatcher-query-5  io.trino.event.QueryMonitor TIMELINE: Query 20240905_130215_00001_2wyp4 :: FAILED (HIVE_FILESYSTEM_ERROR) :: elapsed 1816ms :: planning 115ms :: waiting 452ms :: scheduling 1701ms :: running 0ms :: finishing 1701ms :: begin 2024-09-05T13:02:15.085Z :: end 2024-09-05T13:02:16.901Z
fantasyljc commented 1 week ago

HDFS Check

Same VM

hdfs dfs -ls abfss://cdp@mysa01.dfs.core.chinacloudapi.cn/tmp/test01
Found 1 items
-rw-r--r--   1 3bf125e1-6445-44cd-b7ea-d4eb3efc79b0 root         17 2024-09-05 11:05 abfss://cdp@mysa01.dfs.core.chinacloudapi.cn/tmp/test01/000000_0