trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
9.89k stars 2.86k forks source link

Improve docs for MVs with federated queries #15199

Open mosabua opened 1 year ago

mosabua commented 1 year ago

Following up on #15108.

We might have to clarify this more .. e.g. does it work if the MV is in one catalog .. but the source table is in another catalog .. but both use Iceberg?

Also what happens really when iceberg tables are outdated but other catalogs are involved .. what query is actually run?

We should offer some guidance on what a user do .. e.g. if you know the underlying tables are outdated ... does manually running a refresh update all the data? Is that going to be a very heavy operation since it queries all source as on first refresh and overwrites all the data?

Ideally @colebow or @bitsondatadev can work with @findepi @raunaqmorarka and @claudiusli to get this clarified and updated

findepi commented 1 year ago

thannk you @mosabua for this ticket

my answers below

We might have to clarify this more .. e.g. does it work if the MV is in one catalog .. but the source table is in another catalog .. but both use Iceberg?

It won't work (will treat the table as "non-Iceberg"). Yes, we need to clarify that

Also what happens really when iceberg tables are outdated but other catalogs are involved .. what query is actually run?

Currently, when Iceberg tables are outdated, then "the view is known to be stale", and gets inlined (materialized state isn't used). This is consistent with m views on Iceberg tables only.

does manually running a refresh update all the data?

yes

Is that going to be a very heavy operation since it queries all source as on first refresh and overwrites all the data?

"very heavy" may mean different things to different people.

yes, it's equally expensive as the first refresh we don't have incremental refreshes at all, to the best of my knowledge, not even for m views on Iceberg tables solely, so no change here

mosabua commented 1 year ago

Also note https://github.com/trinodb/trino/pull/15108#issuecomment-1323479451

mosabua commented 1 year ago

So for incremental to work from what I understand not the MV and the source tables all have to be in the same catalog and it has to use the Iceberg connector..

mosabua commented 1 year ago

Also @colebow and @bitsondatadev .. we potentially should update the docs for MVs in the SQL section to talk about the source query and how behavior may differ ..

findepi commented 1 year ago

So for incremental to work from what I understand not the MV and the source tables all have to be in the same catalog and it has to use the Iceberg connector..

No. For incremental to work, Trino needs to have "incremental m view fresh" feature. It doesn't exist at all yet to the best of my knowledge (https://github.com/trinodb/trino/issues/18673) Are you being confused by some Starburst proprietary solution?