Closed adithirgis closed 3 years ago
Thanks for using LightGBM!
We don't have any example documentation of performing grid search specifically in the R package, but you could consult the following:
?lightgbm::lgb.cv
(or https://lightgbm.readthedocs.io/en/latest/R/reference/lgb.cv.html): using LightGBM-specific cross validation to estimate how well a LightGBM model will generalizeThank you for the prompt response. I tried something similar, not sure if it elegant.
library(lightgbm)
library(Matrix)
library(MLmetrics)
file_shared <- data
train_ind <- sample(seq_len(nrow(file_shared)), size = (nrow(file_shared) * 0.75))
train_x <- as.matrix(file_shared[train_ind, c("hour", "PA_RH", "PA_Temp", "PA_CF_ATM") ])
train_y <- as.matrix(file_shared[train_ind, "BAM" ])
test_x <- as.matrix(file_shared[-train_ind, c("hour", "PA_RH", "PA_Temp", "PA_CF_ATM") ])
test_y <- as.matrix(file_shared[-train_ind, "BAM" ])
dtrain <- lgb.Dataset(train_x, label = train_y)
lgb_grid <- list(objective = "regression",
metric = "l2",
min_sum_hessian_in_leaf = 1,
feature_fraction = 0.7,
bagging_fraction = 0.7, # cannot write c(0, 0.5, 0.7)
bagging_freq = 5,
min_data = 100,
max_bin = 50,
lambda_l1 = 8,
lambda_l2 = 1.3,
min_data_in_bin = 100,
min_gain_to_split = 10,
min_data_in_leaf = 30,
is_unbalance = TRUE)
lgb_normalizedgini <- function(preds, dtrain){
actual <- getinfo(dtrain, "label")
score <- NormalizedGini(preds, actual)
return(list(name = "gini", value = score, higher_better = TRUE))
}
lgb_model_cv <- lgb.cv(params = lgb_grid, data = dtrain, learning_rate = 0.02, num_leaves = 25,
num_threads = 2 , nrounds = 7000, early_stopping_rounds = 50,
eval_freq = 20, eval = lgb_normalizedgini, nfold = 5, stratified = TRUE)
best_iter <- lgb_model_cv$best_iter
lgb_model <- lgb.train(params = lgb_grid, data = dtrain, learning_rate = 0.02,
num_leaves = 25, num_threads = 2 , nrounds = best_iter,
eval_freq = 20, eval = lgb_normalizedgini)
test_x$pred_lightgbm <- predict(lgb_model, test_x)
ggplot(test_x, aes(BAM, pred_lightgbm)) + geom_point() + geom_smooth(method = "lm")
summary(lm(BAM ~ pred_lightgbm, data = test_x))
mean(abs((test_x$BAM - test_x$pred_lightgbm) / test_x$BAM)) * 100
Looks like a fine approach to me! And then trying different combinations of parameters in the object you've called lgb_grid
through this approach, you could use the results from lgb.cv()
to get the expected performance of your model with those different parameter values.
Anything else we can help with?
Thanks again! No, Ill try something out. I will watch this issue in case someone comes up with a method.
Ok sounds good! We actually try to keep the list of open issues as small as possible (to focus maintainers' attention), so I'm going to close this for now.
If you'd be interested in contributing a vignette on hyperparameter tuning with the {lightgbm}
R package in the future, I'd be happy to help with any questions you have on contributing!
Once the 3.3.0 release (#4310) makes it to CRAN, we'll focus on converting the existing R package demos to vignettes (@mayer79 has already started this in #3946), and I think a hyperparameter tuning one would be very valuable!
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.
Not sure where I could ask this. Are there tutorials / resources for tuning
lightGBM
using grid search or any other methods in R? I want to tune the hyper parameters in LightGBM using the original packagelightGBM
in R without usingtidymodels
. I use this resource for now - https://www.kaggle.com/andrewmvd/lightgbm-in-r. Thank you!