mlr-org / mlr3

mlr3: Machine Learning in R - next generation
https://mlr3.mlr-org.com
GNU Lesser General Public License v3.0
914 stars 84 forks source link

Task cbind breaks when task's backend has primary_key different to `..row_id` #961

Open sebffischer opened 10 months ago

sebffischer commented 10 months ago
library(mlr3verse)
#> Loading required package: mlr3
library(data.table)

d = data.table(
  x = factor(letters[1:10]),
  y = rnorm(10),
  my_key = 1:10
)

backend = as_data_backend(d, primary_key = "my_key")

task = as_task_regr(backend, target = "y")

learner = as_learner(ppl("robustify") %>>% lrn("regr.rpart"))

learner$train(task)
#> Error: All backends to rbind must have the primary_key 'my_key'
#> This happened PipeOp encode's $train()

Created on 2023-08-31 with reprex v2.0.2

mb706 commented 10 months ago

probably an issue with Task$cbind()

sebffischer commented 10 months ago

When a data.frame is passed to Task$cbind as_data_backend.data.frame is called which automtically sets the primary key to ..row_id

sebffischer commented 10 months ago

we could handle both cases:

  1. A data.frame is passed to $cind() --> then we create the primary_key under the name of the existing primary_key
  2. A backend is passed to $cbind() --> then we can call DataBackendRename in case the primary key's don't match and the primary key of the task's backend is not a column name in the backend passed to $cbind().