Currently there is no way to check if the Flint index refresh has caught up with the source data. Users need to know whether the Flint index is up-to-date.
What solution would you like?
Due to the lack of direct correspondence between each micro batch and the source:
For raw datasets, there is no version control;
For table formats like Iceberg, there is also no 1-to-1 mapping from snapshot to micro batch
The proposed solution is to find such correspondence using committed offsets in the streaming job checkpoint:
For raw dataset, found how many files refreshed
For table format, found snapshot ID that already processed
This can return to users through a column in SHOW FLINT INDEX statement output.
Is your feature request related to a problem?
Currently there is no way to check if the Flint index refresh has caught up with the source data. Users need to know whether the Flint index is up-to-date.
What solution would you like?
Due to the lack of direct correspondence between each micro batch and the source:
The proposed solution is to find such correspondence using committed offsets in the streaming job checkpoint:
This can return to users through a column in
SHOW FLINT INDEX
statement output.What alternatives have you considered?
N/A
Do you have any additional context?
Quick test with Iceberg table: