[R-package] Examples to tune lightGBM using grid search

adithirgis commented 3 years ago

Not sure where I could ask this. Are there tutorials / resources for tuning lightGBM using grid search or any other methods in R? I want to tune the hyper parameters in LightGBM using the original package lightGBM in R without using tidymodels. I use this resource for now - https://www.kaggle.com/andrewmvd/lightgbm-in-r. Thank you!

jameslamb commented 3 years ago

Thanks for using LightGBM!

We don't have any example documentation of performing grid search specifically in the R package, but you could consult the following:

general docs on hyperparameter tuning in LightGBM: https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html
?lightgbm::lgb.cv (or https://lightgbm.readthedocs.io/en/latest/R/reference/lgb.cv.html): using LightGBM-specific cross validation to estimate how well a LightGBM model will generalize

adithirgis commented 3 years ago

Thank you for the prompt response. I tried something similar, not sure if it elegant.

library(lightgbm)
library(Matrix)
library(MLmetrics)

file_shared <- data
train_ind <- sample(seq_len(nrow(file_shared)), size = (nrow(file_shared) * 0.75))
train_x <- as.matrix(file_shared[train_ind, c("hour", "PA_RH", "PA_Temp", "PA_CF_ATM") ])
train_y <- as.matrix(file_shared[train_ind, "BAM" ])
test_x <- as.matrix(file_shared[-train_ind, c("hour", "PA_RH", "PA_Temp", "PA_CF_ATM") ])
test_y <- as.matrix(file_shared[-train_ind, "BAM" ])
dtrain <- lgb.Dataset(train_x, label = train_y)

lgb_grid <- list(objective = "regression",
                metric = "l2", 
                min_sum_hessian_in_leaf = 1,
                feature_fraction = 0.7,
                bagging_fraction = 0.7, # cannot write c(0, 0.5, 0.7)
                bagging_freq = 5,
                min_data = 100,
                max_bin = 50,
                lambda_l1 = 8,
                lambda_l2 = 1.3,
                min_data_in_bin = 100,
                min_gain_to_split = 10,
                min_data_in_leaf = 30,
                is_unbalance = TRUE)

lgb_normalizedgini <- function(preds, dtrain){
  actual <- getinfo(dtrain, "label")
  score  <- NormalizedGini(preds, actual)
  return(list(name = "gini", value = score, higher_better = TRUE))
}

lgb_model_cv <- lgb.cv(params = lgb_grid, data = dtrain, learning_rate = 0.02, num_leaves = 25,
                       num_threads = 2 , nrounds = 7000, early_stopping_rounds = 50,
                       eval_freq = 20, eval = lgb_normalizedgini, nfold = 5, stratified = TRUE)
best_iter <- lgb_model_cv$best_iter

lgb_model <- lgb.train(params = lgb_grid, data = dtrain, learning_rate = 0.02,
                       num_leaves = 25, num_threads = 2 , nrounds = best_iter,
                       eval_freq = 20, eval = lgb_normalizedgini)

test_x$pred_lightgbm <- predict(lgb_model, test_x)

ggplot(test_x, aes(BAM, pred_lightgbm)) + geom_point() + geom_smooth(method = "lm")
summary(lm(BAM ~ pred_lightgbm, data = test_x))
mean(abs((test_x$BAM - test_x$pred_lightgbm) / test_x$BAM)) * 100

jameslamb commented 3 years ago

Looks like a fine approach to me! And then trying different combinations of parameters in the object you've called lgb_grid through this approach, you could use the results from lgb.cv() to get the expected performance of your model with those different parameter values.

Anything else we can help with?

adithirgis commented 3 years ago

Thanks again! No, Ill try something out. I will watch this issue in case someone comes up with a method.

jameslamb commented 3 years ago

Ok sounds good! We actually try to keep the list of open issues as small as possible (to focus maintainers' attention), so I'm going to close this for now.

If you'd be interested in contributing a vignette on hyperparameter tuning with the {lightgbm} R package in the future, I'd be happy to help with any questions you have on contributing!

Once the 3.3.0 release (#4310) makes it to CRAN, we'll focus on converting the existing R package demos to vignettes (@mayer79 has already started this in #3946), and I think a hyperparameter tuning one would be very valuable!

github-actions[bot] commented 1 year ago

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

microsoft / LightGBM

[R-package] Examples to tune lightGBM using grid search #4642