Closed ravenac95 closed 2 months ago
Related to #1981 and #1982
SQLMesh was attempted but the current issue is that it doesn't support running on clickhouse. Additionally, if we are to run this on bigquery we still need to finish writing a dagster-sqlmesh adapter. I have one I've been working on personally that I figured I'd give to us if we wanted but it's still missing a few necessary features.
We are currently testing cube.dev. In general it seems nice but there seems to be some set of problems that I'm currently debugging. Namely some of our classifications for metrics (dimensions in cube) happen over a rolling window. Creating a dimension
like this isn't supported so I've tried a few things and those still don't seem to work. The latest issue appears to be that using rolling window measure with clickhouse seems to have some kind of bug. I'm testing a few more things there but we may need to do a hybrid of processing rolling windows with sqlmesh and using cube for querying some of the preprocessed metrics.
I'm going to close this now in favor of breaking down the remaining work. This was a large chunk of the sprint but the remaining work is now to make some pragmatic choices to make this work with a hybrid solution of sqlmesh + cube + clickhouse.
What is it?
The timeseries metrics have proven too large for dbt and bigquery to execute as we hope for them to be executed.
Some things were attempted to get these things working properly which unfortunately has led to a larger investigation.
Things tried:
int_timeseries_
models into smaller models. This led to a discovery that generating0
events for all artifacts created too large a model (55TB for developers artifacts days)The remaining methods to try (in order) now are:
Other things that have been considered and not tried:
The reasons we didn't do most of these is mostly that we want to be able to expose exploring the dimensions of our metrics to users. These could still be useful in that world.