trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.37k stars 2.98k forks source link

Extend `ConnectorMetadata` to support `objectExists` check #12069

Open Riddle4045 opened 2 years ago

Riddle4045 commented 2 years ago

Full Context here https://github.com/trinodb/trino/pull/11555#discussion_r846716138

Tl;dr

Executing DML statements such as DROP MATERIALIZED VIEW or DROP VIEW executes DropMaterializedViewTask and DropTableTasks these tasks check if the object is the a table instead of a materializedView or a view by invoking metadata.getTableHandle which in case of shared metastore like Iceberg, Hive, delta lake etc can throw bogus exceptions like Cannot query Iceberg/Hive table when queried under the wrong context.

The suggestion here is to introduce a tableExists method in ConnectorMetadata that returns information about presence/absence of the tableHandle.

findepi commented 2 years ago

cc @martint @kokosing @losipiuk @electrum

findepi commented 2 years ago

Context here #11555 (comment)

@findinpath can you copy under this issue rationale why we would want to have tableExists method? also, would relationExists suite the needs?

Riddle4045 commented 2 years ago

Context here #11555 (comment)

@findinpath can you copy under this issue rationale why we would want to have tableExists method? also, would relationExists suite the needs?

@findinpath I summarized the problem statement - let me know if it looks accurate. Thanks!

findinpath commented 2 years ago

Let's follow @findepi's suggestion and adopt a more generic name for the method relationExists. The reasoning is that a relation may be a table or a view.

Also worth mentioning

If you lookup in the code for the usages of the method io.trino.spi.connector.ConnectorMetadata#getTableHandle(io.trino.spi.connector.ConnectorSession, io.trino.spi.connector.SchemaTableName) you will notice a few places containing checks like:

metadata.getTableHandle(session, name) == null

which can obviously be changed to use this newly introduced method.

Riddle4045 commented 2 years ago

Let's follow @findepi's suggestion and adopt a more generic name for the method relationExists. The reasoning is that a relation may be a table or a view.

Also worth mentioning

If you lookup in the code for the usages of the method io.trino.spi.connector.ConnectorMetadata#getTableHandle(io.trino.spi.connector.ConnectorSession, io.trino.spi.connector.SchemaTableName) you will notice a few places containing checks like:

metadata.getTableHandle(session, name) == null

which can obviously be changed to use this newly introduced method.

@findepi @findinpath is the idea behind relationExists to pass in a fullyQualitifedIdentifer and a type of relation? Are we intending to support queries like roleExists schemaExists all in one place?

findinpath commented 2 years ago

IMO this method is only about checking whether a table/view exists.

losipiuk commented 2 years ago

The suggestion here is to introduce a tableExists method in ConnectorMetadata that returns information about presence/absence of the tableHandle

Alternative would be to catch exception within DropMaterializedViewTask/DropTableTasks and interpret it as "table does not exists". Would that be possible to filter based on error code?

Somewhat related: https://github.com/trinodb/trino/issues/6551

Riddle4045 commented 2 years ago

The suggestion here is to introduce a tableExists method in ConnectorMetadata that returns information about presence/absence of the tableHandle

Alternative would be to catch exception within DropMaterializedViewTask/DropTableTasks and interpret it as "table does not exists". Would that be possible to filter based on error code?

Somewhat related: #6551

@losipiuk thanks for pointing out the discussion. We could catch the exception but since exceptions for metadata provider implementations are not standardized ( also pointed in the related discussion ) this solution might not cover all future cases.

Riddle4045 commented 2 years ago

IMO this method is only about checking whether a table/view exists.

@findinpath qq - there are some APIs checking for view defintion in MetadataManager isView & isMaterializedView - how do those play into this change?

findinpath commented 2 years ago

there are some APIs checking for view defintion in MetadataManager isView & isMaterializedView - how do those play into this change?

In case of Hive / Iceberg internally these calls make a metastore.getTable call and then add corresponding logic on top of it. To answer your question, these methods are unaffected by this new addition.

The method relationExists would be primarily relevant in the shared metastore space where we just want to know whether a relation with the specified name already exists - without caring whether the relation is a table or a view or if it is Hive/Iceberg compatible.

findepi commented 2 years ago

The method relationExists would be primarily relevant in the shared metastore space where we just want to know whether a relation with the specified name already exists - without caring whether the relation is a table or a view or if it is Hive/Iceberg compatible.

That means that we would want to return true without checking whether the thing is actually a true relation (a table, a view or an MV). The object may turn out to be neither of these (unqueryable).

Maybe we should make the method name according to that: doesObjectExist.

cc @martint

Riddle4045 commented 2 years ago

The method relationExists would be primarily relevant in the shared metastore space where we just want to know whether a relation with the specified name already exists - without caring whether the relation is a table or a view or if it is Hive/Iceberg compatible.

That means that we would want to return true without checking whether the thing is actually a true relation (a table, a view or an MV). The object may turn out to be neither of these (unqueryable).

Maybe we should make the method name according to that: doesObjectExist.

cc @martint

@findepi Updated #12128

findepi commented 2 years ago

cc @trinodb/maintainers for pending agreement.

findinpath commented 2 years ago

Can we reach an agreement whether this feature is worth implementing?

sopel39 commented 2 years ago

I would think we probably need boolean tableExist/viewExist/materializedViewExists method with default fallback. I think such methods would be beneficial as we wouldn't need for example to perform query translation (Hive -> Trino) as part of DROP ...

findepi commented 2 years ago

@sopel39 can you elaborate on a benefit of having this as 3 separate methods, as compared to single relation/object exists check proposed here?

also, did you consider a single method that would check relation/object existence and also return the type of that relation/object?

sopel39 commented 2 years ago

@sopel39 can you elaborate on a benefit of having this as 3 separate methods, as compared to single relation/object exists check proposed here?

Well, we have hints like: you wanted to drop MV, but table like X exists.

also, did you consider a single method that would check relation/object existence and also return the type of that relation/object?

I think it's too complicated, but I don't have strong feelings.

Riddle4045 commented 2 years ago

@sopel39 can you elaborate on a benefit of having this as 3 separate methods, as compared to single relation/object exists check proposed here?

Well, we have hints like: you wanted to drop MV, but table like X exists.

also, did you consider a single method that would check relation/object existence and also return the type of that relation/object?

I think it's too complicated, but I don't have strong feelings.

@trinodb/maintainers bumping this up.