prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
15.74k stars 5.28k forks source link

Feature Enhancement: Enable Presto Server to Transmit Catalog Name for Enhanced Functionality in the Metastore Layer #22895

Open AnuragKDwivedi opened 4 weeks ago

AnuragKDwivedi commented 4 weeks ago

Expected Behavior or Use Case

When creating a schema in Presto, the catalog name should be passed to the metastore layer.

Presto Component, Service, or Connector

This request is related to the Presto service and its interaction with the metastore layer.

Possible Implementation

Modify Presto to include the catalog name information when creating a schema, allowing the metastore layer to utilize it internally.

Example Screenshots (if appropriate):

N/A

Context

Currently, Presto does not share the catalog name with the metastore layer when creating a schema. This change will enhance the metastore's ability to effectively manage schemas and metadata.

Passing the catalog name to the metastore layer from Presto would unlock several benefits:

Overall, this enhancement would not only optimize the functionality of Presto but also enhance the capabilities of the metastore layer, leading to a more robust and efficient data processing system.

tdcmeehan commented 4 weeks ago

Can you more concretely describe how the metastore would actually use the Presto catalog name? Is this for some sort of fork of the Hive metastore?

AnuragKDwivedi commented 3 weeks ago

Lakehouses have 3 part names for table objects. <catalog-name>.<schema-name>.<table-name>.
Extending this concept to other objects <catalog-name>.<schema-name>.<object-name>. Currently metastores like HMS are limited to store only a two part name <schema-name>.<table-name> based on the assumption that it will store metadata about only one catalog. There are limitations to this design: This limitation forces SQL engines like Presto & Spark to store the 3rd part of the name <catalog-name> external to the metastore which is not ideal place for metadata. This limitation of HMS also introduces another problem that each SQL engine that accesses an object like an iceberg table can name the catalog differently, which will makes it difficult to identify a table object uniquely across different engines. This also makes HMS single tenant (one catalog per metastore). Drawing an analogy to traditional databases world, a single instance can hold multiple databases with schemas and table objects dbname.schemaname.tablename To solve the above limitation, to make Presto engines work with multi catalog metastores that support HMS-like protocol, there needs to be a way to pass the catalog to the external metastore which this PR addresses.

Even though it will transmit the catalog name to the traditional HMS metastore, the traditional HMS metastore will ignore it. The catalog name won't be retained in the metastore, ensuring there's no regression or alteration in behavior, thereby maintaining full backward compatibility.