opensource-observer / oso

Measuring the impact of open source software
https://opensource.observer
Apache License 2.0
73 stars 16 forks source link

Dagster + SQLMesh Metrics: Use DuckDB as a pre-warmed cache for rolling metrics #2445

Closed ravenac95 closed 3 days ago

ravenac95 commented 2 weeks ago

What is it?

Based on some previous testing I've done (seen as part of #2430) we can actually get the metrics to run in a semi-performant way with a very large duckdb instance. Due to the way that the sqlmesh rolling windows ran upon our initial version, deletes + writes into trino were exceedingly slow. Using duckdb as a pre-warmed cache, we can distribute the calculation of metrics to a cluster of pre-warmed duckdbs and then write that back to the trino warehouse.

ravenac95 commented 3 days ago

Closing this with #2469