starburstdata / metabase-driver

Starburst Metabase driver
Apache License 2.0
60 stars 11 forks source link

Is there a plan to support trino metabase model cache? #56

Open ydpzg opened 1 year ago

leniartek commented 1 year ago

Hi @ydpzg thanks for the question. We will investigate the effort on our side and get back to you.

Can you elaborate more on your use case, performance and what change you expect with the model cahce?

Regards

andrewdibiasio6 commented 1 year ago

Here are the docs for modeling caching: https://www.metabase.com/docs/latest/data-modeling/models.html#model-caching

Maiquu commented 1 year ago

Implementation of model caching of postgres driver can be found at: https://github.com/metabase/metabase/blob/master/src/metabase/driver/postgres/ddl.clj

I wrote a very primitive implementation of model caching for the starburst driver. Due to my lack of clojure knowledge, most of the code is extracted from the postgres implementation. Since trino does not support all DDL statements in transactions, I removed the transaction wrappers and I haven't added rollbacks so beware. Source

Keep in mind that model caching uses CREATE TABLE AS and its not supported in all connectors. Its currently supported at following connectors.

EDIT: Forgot loading the ddl implementation in driver at the time I posted this. Should be fixed now.

leniartek commented 1 year ago

Thanks @Maiquu! We will work on model cache in 2023, I will share more details when we have more concrete ETA.

mir1198yusuf commented 1 year ago

My use case is : Data source 1 is postgres database Data source 2 is postgres database

Both are production databases but I need to join data between two in same query for analytics. I can do that with Trino. I am using Metabase. I want to save the query result for some hours. Save anywhere except in data sources 1 and 2. Metabase does this, it stores model cache in source db only which in turn again increases load on source db.

So if I can store the result of query in database 3 and Metabase always hit that database3 is ideal thing for me. Load on database 3 is not an issue.

Can this be possible ?

splashvarun commented 1 year ago

Currently using Trino on metabase, we have to select one data source as Metabase supports only 2 level hierarchy of databases and tables. If we can allow the capability to use mysql or postgres as the default option in the metabase, then use the model cache that Metabase connectors of those models support, it can solve this issue.

That means, on all trino (starburst) connector, select a metadata datasource which will be used internally by metabase to cache model. Then we write the native sql to query other production datasources that are configured in trino itself.