mlr-org / mlr3

mlr3: Machine Learning in R - next generation
https://mlr3.mlr-org.com
GNU Lesser General Public License v3.0
918 stars 84 forks source link

Labeling parameters inside pipeline #735

Open MislavSag opened 2 years ago

MislavSag commented 2 years ago

Hi,

ClassifTask object requires target column as argument. It is implied labels are given. But in my setup, labeling is part of the prediction problem. I am working with financial series data and there are multiple recommendations on how to label data. For example, there is fixed return method, fixed time method, triple barrier method, trend scanning etc. See here for more: https://youtu.be/jk7A4yXKxUk

Now, i would like to try multiple labeling approaches. I would like to include this testing inside pipeline using mlr3pipelines. But I it seems to me this is not possible because Task requires targets as input. Simple example, I can try to predict returns after a day, week and month. And I would like to see if prediction accuracy is different across horizons.

Only thing that come to my mind is to create different tasks for different labeling and use multiple tasks in benchmark? Is this the best approach?

mllg commented 2 years ago

I am not familiar with this particular problem. Multi-label classification is not supported (yet). But if it is only about creating a task, the labels can be set to NA.

Does that help?

MislavSag commented 2 years ago

Is it possible o set labels to NA and than create labels after we have created a Task?