Replace subgraph - Githubissues

sumny commented 3 years ago

closes #538 todo:

[ ] varargs
[ ] better docs
[ ] more tests
[ ] complementary function that gives us the IDs of all pipeops that are "between" two (or more) pipeops.

sumny commented 3 years ago

Something that bothers me with vararg inputs: Suppose $replace_subgraph() should work more or less like %>>%, i..e, the ids to replace constitute an actual valid subgraph of the current graph and are in topological order. In this case replacing a single PipeOp that used to be in a gunion with another one and connected to a vararg input will fail:

gr = po("branch", c("pca", "nop")) %>>% gunion(list(po("pca"), po("nop"))) %>>% po("unbranch")
gr$replace_subgraph("pca", substitute = po("ica"))

after having removed pca, the edges, input and output look like this:

gr$edges
   src_id src_channel   dst_id dst_channel
1: branch         nop      nop       input
2:    nop      output unbranch         ...

gr$input
           name train predict  op.id channel.name
1: branch.input     *       * branch        input

gr$output
              name train predict    op.id channel.name
1:      branch.pca     *       *   branch          pca
2: unbranch.output     *       * unbranch       output

i.e., the graph does not remember that there was a second edge connecting to "unbranch" (to be fair, how could he?)

Now suppose we would do it like %>>%, i.e., gr %>>% po("ica"); this would fail because the graphs have mismatching numbers of inputs / outputs; if we ignore this and connect nevertheless, the output graph would look like this:

:(

A workaround would be to do:

gr$replace_subgraph(c("pca", "nop"), substitute = gunion(list(po("ica"), po("nop"))))

This works because the vararg input of unbranch is now an actual free input again that can be connected to. In my opinion this is not satisfying because replacing a single branching option is something that should easily be doable. (With the current $replace_subgraph() it is doable if we would have constructed unbranch without a vararg input.) I feel like for $replace_subgraph() to be this flexible, the implemenation has to go beyond what %>>% does, i.e, one could restrict the substitute to inherit the input and output connections of the subgraph to be replaced and match the pipeops via their ordered ids. But maybe @mb706 @pfistfl have some better ideas?

mb706 commented 2 years ago

Status:

varargs are not treated in a special way: the number of connections of varargs of the prior graph stays the same. otherwise we could end up "removing" a connection entirely.
allow one-to-many-matching: if prior graph has one output or inserted graph has one output, match to number of corresponding inputs
we still don't agree on a name for subgraph completion method, working title completify_graph
export(?) but at least test completify_graph on its own.
definition of completify_graph would be: "list all POs that need to be removed so that all POs not connected to prior input / output are gone" or "... so that all POs in a resulting disconnected component not connected to I / O are removed"
calculating "dangling" edges: see #538
see if we write a helper for type checks that is also used in %>>%.

sebffischer commented 7 months ago

This could be used in torch for transfer learning @mb706

mlr-org / mlr3pipelines

Replace subgraph #560