trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.49k stars 3.02k forks source link

Allow querying Iceberg table by its location, without registering it in metastore #2298

Open jdintruff opened 4 years ago

jdintruff commented 4 years ago

The Iceberg connector currently supports reading Iceberg tables that are managed by the Hive metastore. Iceberg also supports external tables where metadata is stored in manifest files alongside the data. I've started working on this and will publish a PR shortly

jdintruff commented 4 years ago

Umbrella issue: #1324

findepi commented 3 years ago

i am not sure what is the scope of this issue?

jdintruff commented 3 years ago

We had a need to read external tables that had been stored on HDFS but were not managed by the Hive metastore. IIRC the idea was to make it so that the Iceberg connector could be pointed at a location on HDFS where an external table was and read the schema and table properties from the Iceberg manifest file that existed alongside the Avro data itself on HDFS.

I had done some work on this before I left but I think @lxynov picked it up for a while and now perhaps @phd3 or @rdsr can comment on where this stands. I suspect Xingyuan's version of this ended up being folded into his PR here: https://github.com/trinodb/trino/pull/4776

findepi commented 3 years ago

i hear two things here:

would you agree these are independent and can be tracked by separate issues?

jdintruff commented 3 years ago

Precisely.

findepi commented 3 years ago

I've update the issue title accordingly. Feel free to create a dedicated issue for Avro format support.