mlr-org / mlr3pipelines

Dataflow Programming for Machine Learning in R
https://mlr3pipelines.mlr-org.com/
GNU Lesser General Public License v3.0
137 stars 25 forks source link

Smarter avoidance of cloning #701

Closed sebffischer closed 9 months ago

sebffischer commented 1 year ago

Although the >>! operator exists, I think even for %>>% some perf improvements could be made.

Oftentimes we have something like

po("pca") %>>% po("learner", lrn("regr.rpart"))`

I think that %>>% should substitute its arguments and check whether the expressions are symbols or calls. If they are calls, we don't have to clone them anyway, because they don't have a binding to a name yet.

mb706 commented 1 year ago

That is a neat idea, but probably not that easy, since a %>>% b %>>% c first calls the second operator, which sees c as a single value and a %>>% b as composite. We could avoid cloning c but would still need to clone a %>>% b -- linear speedup for a problem with quadratic complexity, could very well not be worth it. We could try to parse the whole chain of %>>%s and somehow see if the values are bound to anything.

This whole thing might also get difficult when there are POs with hyperparameters that the user would expected to be cloned.