Open dai-chen opened 5 months ago
For MV, the typical use case is to support using new column in MV query when schema evolution on source table.
After a quick research, it appears that no major database vendors currently support altering the underlying query of a materialized view. As an alternative workaround, users can create a new materialized view with the updated query and preserve access to both old and new datasets by SQL UNION
or OpenSearch index pattern.
Example:
-- Create the first materialized view CREATE MATERIALIZED VIEW mv_1 AS SELECT ... GROUP BY TUMBLE(time, '1 Hour') WITH ( auto_refresh = true ); -- Alter the materialized view to stop auto refresh (not necessary for manual refresh) ALTER MATERIALIZED VIEW mv2 WITH ( auto_refresh = false ); /* * Identify where the old materialized view stops (e.g. last bucket is '2024-09-01 10:00:00'). * Data after this point is either unprocessed or held in the state store as an open window pending output. * a) Stop right before '2024-09-01 11:00:00': the data will be processed by new materialized view * b) Stop in the middle of '2024-09-01 11:00:00': the data will be re-processed by new materialized view */ -- Create new materialized view from next time interval CREATE MATERIALIZED VIEW mv_1 AS SELECT ... WHERE time > '2024-09-01 10:00:00' GROUP BY TUMBLE(time, 'interval') WITH ( auto_refresh = true ); -- Query both old and new materialized view data using index pattern in OpenSearch POST /flint_mv_*/_search
Is your feature request related to a problem?
Currently, the ALTER INDEX statement in Flint only supports changing index options, such as index refresh mode (refer to the documentation: ALTER INDEX Options). For table formats like Iceberg, which support schema evolution, there is no way for users to add more indexed columns without re-creating the index.
What solution would you like?
Introduce support for schema changes in the
ALTER INDEX
statement similar to theALTER TABLE
statement. This would include:ALTER SKIPPING INDEX ON <tableName> ADD COLUMN ...
ALTER INDEX <indexName> ON <tableName> ADD COLUMN ...
What alternatives have you considered?
The only current alternative is for users to re-create the index whenever they need to add more indexed columns.
Do you have any additional context?
It is important to note that this change will impact the correctness of the query rewriter, as the new columns added to the Flint index will be missing in old data: