mlr-org / mlr3

mlr3: Machine Learning in R - next generation
https://mlr3.mlr-org.com
GNU Lesser General Public License v3.0
927 stars 86 forks source link

Group by calculation inside the pipe #897

Open MislavSag opened 1 year ago

MislavSag commented 1 year ago

Hi,

Recently, I am trying to build mlr3 pipeline (graph) for predicting financial outcomes (financial time series).

In preprocessing step, I often need to apply some function on group by basis. More concretely, I need to apply some function by month.

I have already opened an issue with an example: winsorization by groups: https://github.com/mlr-org/mlr3pipelines/issues/583 In that example, I want to winsorize the data for every month (or every quarter). I doesn't have much sense to winsorize the data across time dimension. So I need month column (or quarter column). But month column is not a feature. It is not a target. I can set a role of that feature to group in the beginning, but how should I used it than. I can get the group column if I use .train_task in Preprocesing pipe, but I actually need .train_dt method.

The problem is more general because instead of winsorization, I could use scaling by group or any other function.

I kindly ask for your recommendation, what is the best way to implement above Pipe?

The solution I thought about:

  1. Set month (or more generally date) column to group. Than, if group is set, apply function (say scaling) on group by basis.
  2. Use month (or date) column as feature but exclude this column in other preprocessing operation (for example we don't want to scale dates).
  3. Set row ids to date and use that for grouping.

EDIT:

Maybe I can put questions more generally. What approach do you recommend if we want to use some columns in preprocessing, but we don't want to use them as fetures or give them other colun roles?

I am aware of mlr3temporal package which had inherited Task class and created the new, TaskForecast class. Maybe I should use this task in my case? And what if I had id and date columns, should I create my own task (TaskPanel for example) by inheriting Task?