mlr-org / mlr3pipelines

Dataflow Programming for Machine Learning in R
https://mlr3pipelines.mlr-org.com/
GNU Lesser General Public License v3.0
137 stars 25 forks source link

Replace subgraph #560

Open sumny opened 3 years ago

sumny commented 3 years ago

closes #538 todo:

sumny commented 3 years ago

Something that bothers me with vararg inputs: Suppose $replace_subgraph() should work more or less like %>>%, i..e, the ids to replace constitute an actual valid subgraph of the current graph and are in topological order. In this case replacing a single PipeOp that used to be in a gunion with another one and connected to a vararg input will fail:

gr = po("branch", c("pca", "nop")) %>>% gunion(list(po("pca"), po("nop"))) %>>% po("unbranch")
gr$replace_subgraph("pca", substitute = po("ica"))

after having removed pca, the edges, input and output look like this:

gr$edges
   src_id src_channel   dst_id dst_channel
1: branch         nop      nop       input
2:    nop      output unbranch         ...

gr$input
           name train predict  op.id channel.name
1: branch.input     *       * branch        input

gr$output
              name train predict    op.id channel.name
1:      branch.pca     *       *   branch          pca
2: unbranch.output     *       * unbranch       output

i.e., the graph does not remember that there was a second edge connecting to "unbranch" (to be fair, how could he?)

image

Now suppose we would do it like %>>%, i.e., gr %>>% po("ica"); this would fail because the graphs have mismatching numbers of inputs / outputs; if we ignore this and connect nevertheless, the output graph would look like this:

image

:(

A workaround would be to do:

gr$replace_subgraph(c("pca", "nop"), substitute = gunion(list(po("ica"), po("nop"))))

This works because the vararg input of unbranch is now an actual free input again that can be connected to. In my opinion this is not satisfying because replacing a single branching option is something that should easily be doable. (With the current $replace_subgraph() it is doable if we would have constructed unbranch without a vararg input.) I feel like for $replace_subgraph() to be this flexible, the implemenation has to go beyond what %>>% does, i.e, one could restrict the substitute to inherit the input and output connections of the subgraph to be replaced and match the pipeops via their ordered ids. But maybe @mb706 @pfistfl have some better ideas?

mb706 commented 2 years ago

image

image

image

Status:

sebffischer commented 7 months ago

This could be used in torch for transfer learning @mb706