expose graph object - Githubissues

CJ-Wright commented 6 years ago

It may be nice to expose a graph object which actually expresses the entire graph as a single entity. This could also make adding/removing nodes easier as we would have networkx methods.

CJ-Wright commented 6 years ago

Brain dump of current issues about this:

I like our current namespace based system. The ability to collapse nodes and give them names is nice. This also means that we don't have to give anything in the graph names if we don't want to.
Our current namespace model however produces a bunch of issues, since it is difficult (although not impossible with reload to get more than one copy of a graph at a time). This is important when data is similar but can't be processed in the exact same graph simultaneously. Eg I could have a graph that processes live weather data but I can't process data from different locations in the same graph. This usually shows up when caches (combine_latest et al) are not globally the same.
We could have a pipeline factory report a simple namespace. This would provide a way to get at all the nodes we choose to name.
Linking is annoying. In this case linking is what happens when two or more pipelines need to combine. In the simple case this is done in a single module which is nice. However, this breaks re-usability since we'd like to make pipeline chunks as reusable as possible. To get around this we could have multiple places producing chunks of pipeline. This causes two issues: 1) generally downstream nodes need an upstream node to subscribe to, so we'd either need to deal with that or write a no-op source node(s) for each pipeline chunk, 2) we now need to link subgraphs together to properly plumb the data. Previously this was all handled explicitly by the definition of the pipeline. It might be nice to have some automated way to link graphs. This would allow us to not know about what the downstreams need or the upstreams provide and just express what a subgraph needs for input and what it outputs. This could produce issues when trying to link cyclic graphs or multiple subgraphs at once but I think the solution to this is to require that subgraphs expose a well defined surface area so cycles and the like are held within the subgraph. This could be handled by examining the simplenamespace for what names it exposes and providing the links. something like:
```
def graph_linker(upstream_graph, downstream_graph):
for name in downstream_graph.attrs:
    if name in upstream_graph:
        for downstream_child in downstream_graph.name.downstreams:
            upstream_graph.name.connect(downstream_child)
        del downstream_graph.name
```
Maybe we should be more clever here and actually bypass the dummy node. This would remove any naming confusion since we would remove the dummy node in the process. This would make certain that any further links to that name would reference the correct node and not the dummy node.
Using graphs is problematic because of names. The names of nodes for graphs must be unique. This causes issues for us since we don't actually assign names to every node. Additionally we would like the linker to link nodes who's names are the same. This issue could be resolved by the above dummy node removal process.
We may also need to override the existing add_node and add_edge logic for networkx digraphs since we'd want this as a user interface for adding nodes and edges to the graph. add_node though is a bit of a misnomer, since it could only be used in the case of source nodes, since no other node type can exist without an upstream node.
Having nodes in a graph could help with visualization, as we need to visualize the graph as an whole entity. This would also give us nice flexibility on the node attributes and plotting.

CJ-Wright commented 6 years ago

The linking could also maybe be done by computing the boundary of the graph?

CJ-Wright commented 6 years ago

Either of these approaches could be put inside a factory which returns the namespace or the graph. One difficulty with this will be the rolling over of state from one pipeline to another. Although maybe it would be best to not worry about this for now.

xpdAcq / streamz_ext

expose graph object #12