Open maligulzar opened 7 years ago
Provenance queries should take in a time window to enable time sensitive debugging
We should keep the provenance for each micro batch.
We should localize given the test function.
We should also keep the aggregated state for each micro batch.
We should allow certain kinds of debugging / tracing queries that work on the aggregated states and localized provenance after all colllection.
Queries on each micro batch should be allowed only within a time window when the results /provenance are not flushed out from the stream.
This should be a good idea for MS project.
Selective provenance for Streaming application.
Leverage states on the spark streaming and use the test function to isolate the faulty microbatch. With the corresponding data from the localized microbatch, use previous states to perform automated debugging. Replay is harder in the streaming.