shed-streaming
Streaming Heterogeneous Event Data
Current Design/Architecture
- The tooling for the event model management should be as transparent and
small as possible.
- shed-streaming accomplishes this by having only two additional nodes
FromEventStream
and ToEventStream
, which convert data from the
event model to base types/numpy and from base types/numpy to the event model
- Everything else will be handled by
streamz
nodes operating on base
types and numpy
- We should track the data provenance with as little burden on the user
as possible.
- Since the users have agreed to be part of our
streamz
based
ecosystem we should track data provenance without any additional work on
the user's part.
- This is accomplished by having the translation nodes keep track of the
- source of the data coming into the graph
- when the data entered the graph
- the graph itself
- Data provenance should support:
- Replaying data analysis
- Env tracking
- Playing new data through old analysis
- Editing analysis and replaying
- Data should be stored via a
DataBroker
, which has a similar structure
to the experimental data.