microsoft / LightGBM

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
https://lightgbm.readthedocs.io/en/latest/
MIT License
16.57k stars 3.82k forks source link

[R-package] Examples to tune lightGBM using grid search #4642

Closed adithirgis closed 3 years ago

adithirgis commented 3 years ago

Not sure where I could ask this. Are there tutorials / resources for tuning lightGBM using grid search or any other methods in R? I want to tune the hyper parameters in LightGBM using the original package lightGBM in R without using tidymodels. I use this resource for now - https://www.kaggle.com/andrewmvd/lightgbm-in-r. Thank you!

jameslamb commented 3 years ago

Thanks for using LightGBM!

We don't have any example documentation of performing grid search specifically in the R package, but you could consult the following:

adithirgis commented 3 years ago

Thank you for the prompt response. I tried something similar, not sure if it elegant.

library(lightgbm)
library(Matrix)
library(MLmetrics)

file_shared <- data
train_ind <- sample(seq_len(nrow(file_shared)), size = (nrow(file_shared) * 0.75))
train_x <- as.matrix(file_shared[train_ind, c("hour", "PA_RH", "PA_Temp", "PA_CF_ATM") ])
train_y <- as.matrix(file_shared[train_ind, "BAM" ])
test_x <- as.matrix(file_shared[-train_ind, c("hour", "PA_RH", "PA_Temp", "PA_CF_ATM") ])
test_y <- as.matrix(file_shared[-train_ind, "BAM" ])
dtrain <- lgb.Dataset(train_x, label = train_y)

lgb_grid <- list(objective = "regression",
                metric = "l2", 
                min_sum_hessian_in_leaf = 1,
                feature_fraction = 0.7,
                bagging_fraction = 0.7, # cannot write c(0, 0.5, 0.7)
                bagging_freq = 5,
                min_data = 100,
                max_bin = 50,
                lambda_l1 = 8,
                lambda_l2 = 1.3,
                min_data_in_bin = 100,
                min_gain_to_split = 10,
                min_data_in_leaf = 30,
                is_unbalance = TRUE)

lgb_normalizedgini <- function(preds, dtrain){
  actual <- getinfo(dtrain, "label")
  score  <- NormalizedGini(preds, actual)
  return(list(name = "gini", value = score, higher_better = TRUE))
}

lgb_model_cv <- lgb.cv(params = lgb_grid, data = dtrain, learning_rate = 0.02, num_leaves = 25,
                       num_threads = 2 , nrounds = 7000, early_stopping_rounds = 50,
                       eval_freq = 20, eval = lgb_normalizedgini, nfold = 5, stratified = TRUE)
best_iter <- lgb_model_cv$best_iter

lgb_model <- lgb.train(params = lgb_grid, data = dtrain, learning_rate = 0.02,
                       num_leaves = 25, num_threads = 2 , nrounds = best_iter,
                       eval_freq = 20, eval = lgb_normalizedgini)

test_x$pred_lightgbm <- predict(lgb_model, test_x)

ggplot(test_x, aes(BAM, pred_lightgbm)) + geom_point() + geom_smooth(method = "lm")
summary(lm(BAM ~ pred_lightgbm, data = test_x))
mean(abs((test_x$BAM - test_x$pred_lightgbm) / test_x$BAM)) * 100
jameslamb commented 3 years ago

Looks like a fine approach to me! And then trying different combinations of parameters in the object you've called lgb_grid through this approach, you could use the results from lgb.cv() to get the expected performance of your model with those different parameter values.

Anything else we can help with?

adithirgis commented 3 years ago

Thanks again! No, Ill try something out. I will watch this issue in case someone comes up with a method.

jameslamb commented 3 years ago

Ok sounds good! We actually try to keep the list of open issues as small as possible (to focus maintainers' attention), so I'm going to close this for now.

If you'd be interested in contributing a vignette on hyperparameter tuning with the {lightgbm} R package in the future, I'd be happy to help with any questions you have on contributing!

Once the 3.3.0 release (#4310) makes it to CRAN, we'll focus on converting the existing R package demos to vignettes (@mayer79 has already started this in #3946), and I think a hyperparameter tuning one would be very valuable!

github-actions[bot] commented 1 year ago

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.