mlr-org / mlr3pipelines

Dataflow Programming for Machine Learning in R
https://mlr3pipelines.mlr-org.com/
GNU Lesser General Public License v3.0
137 stars 25 forks source link

Question: behavior of pipe operations with 'col_roles' tags in tasks #728

Closed jpconnel closed 1 year ago

jpconnel commented 1 year ago

Hello,

I was wondering what the expected behavior for pipe operators is for data assigned a column role. For example, if a 'weights' column is used for sample weights, and the scale pipe operator is used, will the weights be scaled? Or if the 'encode' pipe operator is used and the target variable is a factor, would the target variable be subject to the encoding?

mb706 commented 1 year ago

Most PipeOps operate on feature columns. They sometimes take the values of the target column into account where it makes sense (e.g. PipeOpSubsample can do stratified sampling and therefore needs target values).

Other column roles are very rarely used or modified. I don't think there is any pipeop that takes the weights column into account (PipeOpScale from your question does not), and only few PipeOps can modify the weights (PipeOpClassWeights and PipeOpColRoles). There are also a few others that operate on the target column.

jpconnel commented 1 year ago

Thank you, the detailed response is much appreciated :)