mlr-org / mlr3

mlr3: Machine Learning in R - next generation
https://mlr3.mlr-org.com
GNU Lesser General Public License v3.0
906 stars 84 forks source link

set_col_roles has confusing error for user-defined roles #1040

Open tdhock opened 2 weeks ago

tdhock commented 2 weeks ago

related to https://github.com/mlr-org/mlr3/issues/770 Hi! For https://github.com/tdhock/mlr3resampling I needed to define a new column role "subset" which is possible via

> my_iris <- data.frame(iris, my_subset=1:3)
> itask <- mlr3::TaskClassif$new("iris", my_iris, target="Species")
> itask$set_col_roles("my_subset","subset")
Erreur dans task_set_roles(private$.col_roles, cols, roles, add_to, remove_from) : 
  Assertion on 'roles' failed: Must be a subset of {'feature','target','name','order','stratum','group','weight'}, but has additional elements {'subset'}.
> itask$col_roles$subset <- "my_subset"
> str(itask$col_roles)
List of 8
 $ feature: chr [1:5] "Petal.Length" "Petal.Width" "Sepal.Length" "Sepal.Width" ...
 $ target : chr "Species"
 $ name   : chr(0) 
 $ order  : chr(0) 
 $ stratum: chr(0) 
 $ group  : chr(0) 
 $ weight : chr(0) 
 $ subset : chr "my_subset"

To me the following error message implies that subset is not allowed as a role: "Assertion on 'roles' failed: Must be a subset of {'feature','target','name','order','stratum','group','weight'}, but has additional elements {'subset'}." But subset is actually allowed if created via $ instead of set_col_roles which is confusing for me.

The ?Task docs below say that set_col_roles "is a convenient alternative" which I think means that it should work the same as "just modify the list" but in my example code above we see that set_col_roles has some additional checking, so is not actually the same.

          ‘col_roles’ is a named list whose elements are named by
          column role and each element is a ‘character()’ vector of
          column names. To alter the roles, just modify the list, e.g.
          with R's set functions (intersect(), setdiff(),
          union(), ...). The method $set_col_roles provides a
          convenient alternative to assign columns to roles.

To fix this issue, can you please turn off the role checking/error in set_col_roles? or document that set_col_roles checks for default roles but the user can assign additional roles using $ etc?