python-streamz / streamz

Real-time stream processing for python
https://streamz.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
1.24k stars 148 forks source link

visualizing streams and changing variables during runtime #464

Open OpenCoderX opened 1 year ago

OpenCoderX commented 1 year ago

Does anyone have any suggestions or experience generating a real-time visualization of streams and the events flowing through the pipeline?

After I get a real time viz going I want to be able to select a stream/operator/edge between operators and modify the state. For example temporarily disconnect one stream from another, or change a variable that is currently static such as the condition in a filter operator.

martindurant commented 1 year ago

The instances of streamz are pretty simple things, and you can mutate them in place freely during runtime.

To dynamically attach and detach nodes, you want the .connect() and .disconnect methods, called on the upstream node. Events emitted while disconnected will then be dropped.

Similarly, you can mutate any attribute of a streams node instance. Some nodes, like aggregate have explicit .state attributes that hold the "current" value, and you can set them to whatever you would like. For the filter node, the attribute is called .predicate and, again, you can reassign it at will. A little investigation of the source of any node you want to change will tell you what the attributes are and what type should be assigned to them.

Note that some sources also have explicit .start()/.stop() methods, because they poll some external provider of events, and these methods allow you to pause that polling.

martindurant commented 1 year ago

Oh, and for visualising, I don't think there's any tool that will redraw the event graph itself upon changes such as I've described. Probably holoviz tools could be made to do it with a little hackery. Since the main network viz option for streamz is graphviz, it is restricted to static output (but you could call it multiple times, whenever).

OpenCoderX commented 1 year ago

Thank you @martindurant that is helpful. I think I need to learn some techniques to introspect the entire streamz graph from a separate thread that runs my visualization. maybe I can branch every operator and mirror the metadata and events into this new visualization thread, display the events and operator status. and also create a feedback stream into each operator object that calls a function '_update_operator'. _update_operator would then mutate the stream. Another way of putting could be, every stream operator has an upstream that it watches for management events, if it receives a management event it calls _update_operator and mutates self.

OpenCoderX commented 1 year ago

The more I think about this, the streams that mirror data and state out to the management thread/system should not be upstreams in the existing sense of the term, I'll probably look at introducing a dedicated management stream and management feedback stream. these management streams will only interact from the management thread to the stream operators.

martindurant commented 1 year ago

Please do let us know what you end up implementing. I'm not convinced that you need a separate thread to do what you are suggesting - there is no reason two disconnected event graphs can't be run by the same event loop. However, if they are listening for very different kinds of events (e.g., user actions versus socket or timing signals) than you might be right.