python-streamz / streamz

Real-time stream processing for python
https://streamz.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
1.23k stars 145 forks source link

extend networkx #188

Open CJ-Wright opened 5 years ago

CJ-Wright commented 5 years ago

Networkx provides many great tools for inspection, analysis, and manipulation of graphs. It might be nice to be able to use these tools when working with streamz.

Use Cases: UC1:

  1. User1 writes a graph (graph1) to perform some analysis
  2. User2 would like to extend this analysis so they write a new pipeline (graph2) with dummy parent nodes with the same name as the nodes from graph1.
  3. User2 then uses graph3 = nx.compose(graph2, graph1). Graph3 has nodes from both graphs allowing the graphs themselves to be built modularly.

UC2:

  1. User3 suspects something went wrong in the compose step because of a name mismatch. User3 uses [f for f in graph3 if nx.predecessors(f) ==0] to find all the nodes with no parents. User3 then looks through the list and finds a node who's name didn't match and thus was not linked properly.

I think this can be done via a subclass of nx.DiGraph with select methods overridden, namely:

  1. add_node which now needs to take in a node class, args and kwargs and a name
  2. add_edge which now need to take in either
    1. two node classes with args and kwargs (producing two new nodes)
    2. a node class with args and kwargs and a name
    3. two names
  3. add_edge_from so compose works properly.

Most of the overrides either need to init new nodes or provide connections between nodes.

OpenCoderX commented 5 years ago

I'm excited to see this issue as I integrated streamz and networkx in my trading bot, it works very well.

martindurant commented 5 years ago

@opensourcechris , I'm sure the authors would be very appreciative to hear of your use case and possibly even code, in as far as you are able to share!

OpenCoderX commented 5 years ago

We have a networkx graph model of the forex exchanges, each node has a Streaming dataframe that sinks market data to the next node in the calculation. So they end up becoming one graph with the Steamz df becoming an attribute on the forex graph, but we conceptually think of them as two different graphs with the Streamz graph having nodes not it the forex graph, like when we take market data from two market and zip it into a Streaming dataframe node. We did not modify the Streamz library though. We did subclass DiGraph so I can agree on that point. Networkx also made it easier to visualize the data streams we send to the browser, we traverse the graph to find the nodes the user wants to visualize. Not sure if that helps.

CJ-Wright commented 5 years ago

@opensourcechris Would you be able to put some of that code up? I'm particularly interested to see the subclass of DiGraph (since I was working on that too) and the browser driven visualization.

OpenCoderX commented 5 years ago

Probably I could share some obfuscated snippets, let me check with my leadership.