Open chenzl25 opened 1 year ago
I imagine that MV is created one after another:
My worry is that once MVs are created, the users may hesitate to change them, drop them, or whatever modification, to share the intermediate state with some new MVs that are about to be created, which suggests that we may not have the luxury of optimize multiple queries at the same time
.
But I do think truncate the intermediate materialized view
as a standalone optimization is quite useful.
Sometimes, the user may realize only after a while that the MV they really want is doing some further transformation of some existing MV, which may make the existing one obsolete.
Dropping the old one and completely re-building the new one could be slow.
Re-building from source
may not even be possible in some cases due to the data retention limit in the upstream source.
After users finish the third step, we can provide a way to truncate the intermediate materialized view and let them never materialize its input anymore and finally, make them invisible to users.
I've been thinking of the exact same approach with this months ago!
However, after some offline discussions, we find that: in practice, it's inevitable that more materialized views are needed as the business grows, and "providing all queries at the same time" seems too ideal. If we also want to apply the state reuse optimization for these new materialized views, then we have to find a way to do this in an incremental or patch-like way.
If this really gets implemented, then it could be a superset of the solution proposed in this issue. Since we're now able to do optimization incrementally, then it also works for creating materialized views one by one... 🤔
I do agree in practice, there will be more and more materialized views as the business grows. Anyway, it seems we can provide a conversion between view and materialized view. Converting a materialized view to a view seems exactly the thing truncate materialized
mentioned before.
This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.
Any futher updates?
This issue has been open for 60 days with no activity.
If you think it is still relevant today, and needs to be done in the near future, you can comment to update the status, or just manually remove the no-issue-activity
label.
You can also confidently close this issue as not planned to keep our backlog clean. Don't worry if you think the issue is still valuable to continue in the future. It's searchable and can be reopened when it's time. 😄
Some users might maintain their streaming queries in this way.
If we create those sinks or materialized views one by one, it would result in duplicated states, since views could be used more than once. Currently, we support sharing states in a single query, but we are unable to share states across multiple streaming queries.
One possible solution is to optimize multiple queries at the same time so that we can have a bird's-eye view. Obviously, it needs to have batch creating streaming query interfaces from end to end (e.g. optimizer, meta, scheduler).
Another possible solution is to let users create intermediate materialized views instead of views for the second step. After users finish the third step, we can provide a way to truncate the intermediate materialized view and let it never materialize its input anymore and finally, make them invisible to users.