maligulzar / bigdebug

Apache License 2.0
2 stars 10 forks source link

Provenance on Stream Processing #18

Open maligulzar opened 7 years ago

maligulzar commented 7 years ago

Leverage states on the spark streaming and use the test function to isolate the faulty microbatch. With the corresponding data from the localized microbatch, use previous states to perform automated debugging. Replay is harder in the streaming.

maligulzar commented 7 years ago

Provenance queries should take in a time window to enable time sensitive debugging

miryung commented 7 years ago

We should keep the provenance for each micro batch.

We should localize given the test function.

We should also keep the aggregated state for each micro batch.

We should allow certain kinds of debugging / tracing queries that work on the aggregated states and localized provenance after all colllection.

Queries on each micro batch should be allowed only within a time window when the results /provenance are not flushed out from the stream.

This should be a good idea for MS project.

maligulzar commented 7 years ago

Selective provenance for Streaming application.