unitycatalog / unitycatalog

Open, Multi-modal Catalog for Data & AI
https://unitycatalog.io/
Apache License 2.0
2.44k stars 393 forks source link

Trino Integration not able to access UC data #740

Open avriiil opened 3 days ago

avriiil commented 3 days ago

Describe the bug

I'm testing the UC x Trino integration and running into some trouble. Both are running locally, Trino inside the latest trino Docker image.

I'm following the steps outline in the docs:

  1. start local UC server
  2. set iceberg.properties in Trino Docker container as follows:
connector.name=iceberg
iceberg.catalog.type=rest
iceberg.rest-catalog.uri=http://127.0.0.1:8080/api/2.1/unity-catalog/iceberg
iceberg.rest-catalog.security=OAUTH2
iceberg.rest-catalog.oauth2.token=not_used
  1. Launch Trino container and run SQL query from Trino CLI:

SELECT * FROM iceberg."unity.default".marksheet_uniform;

This fails with Error 404 Not Found: HTTP 404 Not Found, I presume because the local 8080 port can't be accessed from within the Docker container.

  1. As suggested here I tried to forward the UC server to a remote port with serveo using ssh -R 80:localhost:8080 serveo.net
  2. update the iceberg.properties accordingly:

iceberg.rest-catalog.uri=http://<<servio-public-URL>>/api/2.1/unity-catalog/iceberg

  1. Run any SQL query (other than SHOW CATALOGS;) e.g. SHOW SCHEMAS FROM iceberg; throws this error:

Malformed request: Must supply a proper catalog in warehouse property.

Expected behavior

Access schema and table data.

System [please complete the following information]:

[EDIT: accessing the Serveo URL in my browser shows me the Hello, Unity Catalog! message]

tdas commented 2 days ago

We are changing the configuration a little bit by requiring the warehouse property to specify which UC catalog to connect to. Here is the change we made - https://github.com/unitycatalog/unitycatalog/pull/324 This should be released as part of upcoming 0.2.1.

avriiil commented 2 days ago

Thanks for this @tdas, I've got it working now. I will update the Trino docs accordingly.

avriiil commented 2 days ago

closing this one, docs changes tracked in #744

avriiil commented 2 days ago

I celebrated too early -- I can now access schemas and list tables, but I cannot access data within tables.

SHOW TABLES FROM iceberg.default; works SELECT * FROM iceberg.default.marksheet_uniform LIMIT 5; throws:

Failed to load table: marksheet_uniform in default namespace

tdas commented 1 day ago

What is your full configuration in Trino. also @alexreid-db can you help here since you implemented the warehouse change?

avriiil commented 1 day ago

Here's my full config:

iceberg.catalog.type=rest
iceberg.rest-catalog.uri=https://abc.serveo.net/api/2.1/unity-catalog/iceberg
iceberg.rest-catalog.security=OAUTH2
iceberg.rest-catalog.oauth2.token=not_used
iceberg.rest-catalog.warehouse=unity
student2o commented 1 day ago

I am unable to view the schemas and tables in Unity Catalog with the following configuration:

    connector.name=iceberg
    iceberg.catalog.type=rest
    iceberg.rest-catalog.uri=http://localhost:30202/api/2.1/unity-catalog/unity
    iceberg.rest-catalog.warehouse=unity
    iceberg.rest-catalog.security=OAUTH2
    iceberg.rest-catalog.oauth2.token=not_used

Error: trino> show schemas from unity_catalog; Query 20241120_111824_00002_njjp4 failed: Unable to process: Status: 404 Description: Not Found

I am running Unity Catalog in a Docker container using the following command: docker run -p 30202:8080 -it unity-catalog

While validating the Unity Catalog URI using the Databricks API, I discovered that the correct URI is: http://localhost:30202/api/2.1/unity-catalog/catalogs/unity

I would like to discuss the method to properly view schemas and tables in Unity Catalog using Trino, as I am currently unable to access or query them with my existing configuration.

avriiil commented 1 day ago

It may be because Trino can’t access your localhost 30202 from inside the Docker container. You need to forward it to a public URL, for example using serveo or ngrok.

I’m working on more completed documentation, should be up in the next few days.

On Wed, 20 Nov 2024 at 11:47, student @.***> wrote:

I am unable to view the schemas and tables in Unity Catalog with the following configuration:

connector.name=iceberg
iceberg.catalog.type=rest
iceberg.rest-catalog.uri=http://localhost:30202/api/2.1/unity-catalog/unity
iceberg.rest-catalog.warehouse=unity
iceberg.rest-catalog.security=OAUTH2
iceberg.rest-catalog.oauth2.token=not_used

Error: trino> show schemas from unity_catalog; Query 20241120_111824_00002_njjp4 failed: Unable to process: Status: 404 Description: Not Found

I am running Unity Catalog in a Docker container using the following command: docker run -p 30202:8080 -it unity-catalog

While validating the Unity Catalog URI using the Databricks API, I discovered that the correct URI is: http://localhost:30202/api/2.1/unity-catalog/catalogs/unity

I would like to discuss the method to properly view schemas and tables in Unity Catalog using Trino, as I am currently unable to access or query them with my existing configuration.

— Reply to this email directly, view it on GitHub https://github.com/unitycatalog/unitycatalog/issues/740#issuecomment-2488372085, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQLWMST6I6UDK7W5WYXUAKD2BRZERAVCNFSM6AAAAABR7YDQXWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBYGM3TEMBYGU . You are receiving this because you modified the open/close state.Message ID: @.***>