Open CJ-Wright opened 6 years ago
Brain dump of current issues about this:
reload
to get more than one copy of a graph at a time). This is important when data is similar but can't be processed in the exact same graph simultaneously. Eg I could have a graph that processes live weather data but I can't process data from different locations in the same graph. This usually shows up when caches (combine_latest
et al) are not globally the same.def graph_linker(upstream_graph, downstream_graph):
for name in downstream_graph.attrs:
if name in upstream_graph:
for downstream_child in downstream_graph.name.downstreams:
upstream_graph.name.connect(downstream_child)
del downstream_graph.name
Maybe we should be more clever here and actually bypass the dummy node. This would remove any naming confusion since we would remove the dummy node in the process. This would make certain that any further links to that name would reference the correct node and not the dummy node.
add_node
and add_edge
logic for networkx digraphs since we'd want this as a user interface for adding nodes and edges to the graph. add_node
though is a bit of a misnomer, since it could only be used in the case of source nodes, since no other node type can exist without an upstream node.The linking could also maybe be done by computing the boundary of the graph?
Either of these approaches could be put inside a factory which returns the namespace or the graph. One difficulty with this will be the rolling over of state from one pipeline to another. Although maybe it would be best to not worry about this for now.
It may be nice to expose a graph object which actually expresses the entire graph as a single entity. This could also make adding/removing nodes easier as we would have networkx methods.