trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.41k stars 3k forks source link

Iceberg tables not registered in system.jdbc.tables #22868

Closed szisiu closed 1 month ago

szisiu commented 3 months ago

I found that Iceberg tables are not registered in system.jdbc.tables and system.jdbc.columns.

I used to use IDEs (like IntelliJ, DBeaver) with the trino-jdbc driver jar to connect to my trino clusters and now those tools could not discover the existence of tables.

It stopepd working from trino v443 (tested several older and newer versions up to v453 with docker).

Note that tables exists and can be succefully queried (eg from java app with trino-jdbc driver) but it's annoying during development.

How to reproduce:

create schema iceberg.test with (location='s3a://test/test/'); -- minio
create table iceberg.test.tbl as select * from tpch.sf1.customer limit 10;
select * from system.jdbc.tables where table_cat = 'iceberg';
select * from iceberg.test.tbl;

This issue sounds similar to https://github.com/trinodb/trino/issues/11060

ebyhr commented 3 months ago

I can't reproduce the issue on master. Can you share iceberg config property after masking the confidential info? Is it the entire steps to reproduce the issue? Also, did you enable metadata cache or something on those IDEs? Screenshot 2024-07-30 at 8 24 35

szisiu commented 3 months ago

Sure. Let me share more details.

Environment:

My containerized deployment consist of 1 trino v453 service (coordinator and worker with iceberg connector), 1 minio service, 1 mysql v8 and 1 hive metastore service (hms) .

Prepared and attached files we could use as an example.

trino-docker-test.zip

Also this is my IJ setup for trino datasource (mostly defaults, except trino driver version and connection details):

trino-ij-setup1

trino-ij-setup2

trino-ij-setup3

trino-ij-setup4

No tables are displayed for the test schema but they exist and the data is there. Tried different settings also with all schemas enabled/visible.

trino-ij-db-view

trino-iceberg-ij-output

image

As mentioned the same issues happen when using eg DBeaver.

Thanks in advance.

szisiu commented 2 months ago

I tried again with v454 without a success. Iceberg catalog and schema are listed but tables are not.

Did you have a change to look at the attached example?

gietki commented 2 months ago

Any new info? Dealing with the same situation

szisiu commented 2 months ago

Additional info: in my configuration Hive Metastore Standalone v3.1.0 is used as metadata catalog for Iceberg.

During the investigation in https://github.com/trinodb/trino/issues/23132 it appears that the problem does not exist when

<property>
        <name>metastore.rawstore.impl</name>
        <value>org.apache.hadoop.hive.metastore.cache.CachedStore</value>
    </property>

property is removed from the Hive Metastore Standalone config (metastore-site.xml).

Moreover, Iceberg tables are then registered in system.jdbc.tables and system.jdbc.columns.

On the other hand leaving CachedStore enabled seems like a reasonable default for HMS (improves query performance - needs measurement to be sure).

Leaving the decision if there is something that needs improvement to the trino team.