mlr-org / mlr3extralearners

Extra learners for use in mlr3.
https://mlr3extralearners.mlr-org.com/
81 stars 48 forks source link

Update lightgbm with metric parameter #326

Open Vinnish-A opened 7 months ago

Vinnish-A commented 7 months ago

Algorithm

lightgbm

Package

lightgbm

Non-Supported parameter

It is strange to see that in lightgbm.regr.R there is ~metric parameters~ in which ~metric_freq~ can be found while the parameter ~metric~ hasn't been supported.

sebffischer commented 7 months ago

Thanks for raising awareness to this issue? Can you please be a bit more specific? There is a huge amount of learners in this package and I don't know all of them in detail, so your help would be much appreciated here :)

grantmcdermott commented 6 months ago

I believe the OP means this: https://lightgbm.readthedocs.io/en/latest/Parameters.html#metric-parameters

In the native LightGBM API, you can set your evaluation metric (MSE, MAE, etc.) alongside the main objective type (regression, classification etc.). E.g. if you have skewed data, you might want to set metric = "mae"as an evaluation metric for early stopping (instead of the default MSE).

sebffischer commented 6 months ago

Thanks for clarifying! So essentially just the metric parameter is missing from the parameter set?

Vinnish-A commented 6 months ago

I'm sorry I didn't see your reply before. I encountered this problem while migrating a Python-built lightgbm model to R. The following is a simple example, of course I won't run into this problem using the native lightgbm package, but if I want to use it under the mlr3 framework.

learner = lrn("regr.lightgbm")
learner$param_set$set_values(
  num_iterations = 1000, 
  objective = "poisson",
  num_leaves = 30, 
  learning_rate =  0.001,
  feature_fraction =  0.8,
  bagging_fraction =  0.9,
  bagging_seed =  33,
  poisson_max_delta_step =  0.8,
  metric = "poisson",  # Here! 
  early_stopping = T, 
  early_stopping_rounds = 400
)

The current metric parameter is not encapsulated, so I added it myself.

Vinnish-A commented 6 months ago

Of course, the method I added is pretty crude, because this parameter is only used in extreme cases. However, at the moment of model customization, I feel that if you can pass some parameters from the global variables to the encapsulated learner (if the parameter is not encapsulated, or some other case below), then you can solve the problem in the mlr3 framework (such as using mlr3 for convenient hyperparameter tuning). You can get the same flexibility as using the original learner.

# metric parameters
metric = p_fct(default = "poisson", levels = c("poisson", "l1", "l2"), tags = "train"),
metric_freq = p_int(default = 1L, lower = 1L, tags = "train"),

See below for some special cases:

trn_data = lgb.Dataset(X_train, label=y_train)
val_data = lgb.Dataset(X_val, label=y_val)

model = lgb.train(params, trn_data, num_boost_round=1000,
                  valid_sets = [trn_data, val_data], # See here!
                  callbacks = callbacks)

This is really tricky, and I understand that this is different from mlr3's philosophy, but what I am saying is that it would be better if you could pass parameters from the global environment instead of having to wrap them all yourself(For example, xgboost has been upgraded to version 2.0, and some of the previous parameters are replaced by new ones. If we had global parameter passing, there would be no need to update!) .) .

sebffischer commented 6 months ago

@Vinnish-A sorry I don't completely understand what you are trying to say.

However, while the metric parameter parameter is not exposed, you can instead use the eval parameter I believe, see the documentation of lgb.train. Please confirm whether this is what you are looking for.

sebffischer commented 6 months ago

Also note that in principle early stopping / validation is supported and exposed through the early_stopping parameter. Unfortunately this currently does not work with mlr3pipelines but I am currently working on a PR for this problem.