trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.52k stars 3.03k forks source link

Support cross-catalog materialized views in Iceberg #13131

Closed dprophet closed 2 years ago

dprophet commented 2 years ago

CREATE OR REPLACE MATERIALIZED VIEW iceberg.erik.test WITH ( format = 'PARQUET', partitioning = ARRAY['company_bbid'] ) AS SELECT * FROM postgres.small.table

Fails with an error "Cross connector materialized views are not supported" https://github.com/trinodb/trino/blob/master/core/trino-main/src/main/java/io/trino/metadata/MetadataManager.java#L908

Claudius Li: Thanks for that feedback. That was an oversight and we should address it.

Adding this issue to track its progress.

Use Case: We want to be able to use materialized views as a way to take data from production data sources and put into Iceberg for more heavy-lifting/deeper-analytics on said Iceberg tables. This essentially eliminates the unnecessary abuse from running deep analytics on more load sensitive data storage systems.

findepi commented 2 years ago

Such a view can be created

trino:tpch> CREATE OR REPLACE MATERIALIZED VIEW iceberg.tpch.mv WITH (partitioning=ARRAY['regionkey']) AS SELECT * FROM tpch.tiny.nation;
CREATE MATERIALIZED VIEW

it can be queried

trino:tpch> SELECT * FROM iceberg.tpch.mv LIMIT 1;
 nationkey |  name   | regionkey |                       comment
-----------+---------+-----------+-----------------------------------------------------
         0 | ALGERIA |         0 |  haggle. carefully final deposits detect slyly agai

but the queries go directly to the source

trino:tpch> EXPLAIN (TYPE IO) SELECT * FROM iceberg.tpch.mv LIMIT 1;
             Query Plan
-------------------------------------
 {
   "inputTableColumnInfos" : [ {
     "table" : {
       "catalog" : "tpch",
       "schemaTable" : {
         "schema" : "tiny",
         "table" : "nation"

however, it cannot be materialized, which is weird, since we allowed a user to create a MATERIALIZED view that cannot be materialized.

trino:tpch> REFRESH MATERIALIZED VIEW iceberg.tpch.mv;
Query 20220803_125742_00040_gus3y failed: Cross connector materialized views are not supported
io.trino.spi.TrinoException: Cross connector materialized views are not supported
    at io.trino.metadata.MetadataManager.beginRefreshMaterializedView(MetadataManager.java:919)
    at io.trino.sql.planner.optimizations.BeginTableWrite$Rewriter.createWriterTarget(BeginTableWrite.java:284)
    at io.trino.sql.planner.optimizations.BeginTableWrite$Rewriter.visitTableFinish(BeginTableWrite.java:203)
    at io.trino.sql.planner.optimizations.BeginTableWrite$Rewriter.visitTableFinish(BeginTableWrite.java:106)
    at io.trino.sql.planner.plan.TableFinishNode.accept(TableFinishNode.java:106)
...
findepi commented 2 years ago

@sopel39 's https://github.com/trinodb/trino/pull/12499 aims to allow incremental refreshes for such views. I believe that we should support non-incremental materialized views: some connectors won't immediately implement the new SPI, or may never be able to. Also, some query shapes cannot be updated incrementally, and we still want to be able to materialize them.

sopel39 commented 2 years ago

I believe that we should support non-incremental materialized views: some connectors won't immediately implement the new SPI, or may never be able to.

Yes, depending on connector support (source, target) engine will be able to:

or

or

findepi commented 2 years ago

This might help https://github.com/trinodb/trino/pull/15108