mlr-org / mlr3pipelines

Dataflow Programming for Machine Learning in R
https://mlr3pipelines.mlr-org.com/
GNU Lesser General Public License v3.0
132 stars 25 forks source link

feat: enable test set usage in graph #750

Closed sebffischer closed 6 months ago

sebffischer commented 8 months ago

not quite done

sebffischer commented 6 months ago

How do we deal with the problems where the same rows are used for training and predicting, but they are preprocessed differently? In this scenario, we cannot simply pass ONE task to the learner.

We can deal with this problem in different ways: 1) We can simply ensure that the test rows are disjunct from the predict rows (seems simple enough). 2) We can make the private $.train method of the learner accept an additional argument task_predict.

I guess 1) Would probably be easier and good enough.

Also: task's rbind does not allow overwriting already existing rows it seems like

sebffischer commented 6 months ago

Also: task's rbind does not allow overwriting already existing rows it seems like