Closed Laurae2 closed 7 years ago
@Laurae2
Is num_threads
working ?
@guolinke yes it works. When using 20 threads, task manager reports only 50% CPU usage instead of 100%.
Thanks for pinging me. Have you tried with a lower value of pred_early_stop_margin
? I usually use pred_early_stop_margin = 1.5
. If you try something very low like 0.1
it should be very fast to predict but the predictions should be almost completely off. Let me know, I'll be happy to take a look if it's indeed not working.
@cbecker I used pred_early_stop_margin = 0.1
and I am getting identical results as without it. Am I missing a parameter that must be set to activate early stopping for predictions?
That's weird. I never tried the R package but there could be a bug there or in the code I added for early stopping. Can you point me to a zip file with all the files needed to run this code, including the data? I've never used R but I can take a look if I have the whole working code.
@cbecker I give you an easier example.
To run the example and install in R, you need Rtools (if Windows) + cmake (cmake must be in PATH, mandatory step):
install.packages(c("devtools", "matrixStats"))
devtools::install_github("Microsoft/LightGBM/R-package", force = TRUE)
Then you can try this very simplified example:
library(matrixStats)
generated <- matrix(nrow = 10000, ncol = 10)
for (i in 1:10) {
set.seed(i)
generated[sample.int(10000, 1000, replace = FALSE), i] <- 0
}
gen_labels <- as.numeric(rowAnys(generated, value = 0, na.rm = TRUE)) # 6534
to_sort <- order(gen_labels)
generated <- generated[to_sort, ]
gen_labels <- gen_labels[to_sort]
# devtools::install_github("Microsoft/LightGBM/R-package")
# lgb.unloader(wipe = TRUE)
library(lightgbm)
dtrain <- lgb.Dataset(generated, label = gen_labels)
valids <- list(test = dtrain)
model <- lgb.train(list(objective = "binary",
metric = "l2",
min_data = 1,
learning_rate = 0.1,
pred_early_stop = TRUE,
pred_early_stop_freq = 1,
pred_early_stop_margin = 0.1),
dtrain,
1000,
valids,
early_stopping_rounds = 1)
plot(predict(model, generated))
With pred_early_stop_margin = 0.1
it should have stopped at the first iteration, instead it kept decreasing loss until getting near perfect results.
Thanks. I think I know where the problem may come from: pred_early_stop is for prediction, and you are passing those parameters at training time. @guolinke how is this handled in R? Can we pass those parameters to the predict function?
I have some related insight. Commit ac975e734d6982ad94e6394908cea3bd4bd2744d introduced a bug on my system. I installed the version immediately before the commit above and compared it to current version on master.
Reproducible example below.
library(devtools)
install_github("Microsoft/LightGBM", ref = "402474f4063aff3cef9167ecb9f4a035df2736ea", subdir = "R-package")
library(lightgbm)
data(agaricus.train, package = "lightgbm")
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
params <- list(objective = "regression", metric = "l2")
model <- lgb.cv(params,
dtrain,
1000,
nfold = 5,
min_data = 1,
learning_rate = 0.3,
early_stopping_rounds = 10)
Produces a normal result. Returns the model object.
.rs.restartR()
library(devtools)
install_github("Microsoft/LightGBM", subdir = "R-package")
library(lightgbm)
data(agaricus.train, package = "lightgbm")
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
params <- list(objective = "regression", metric = "l2")
model <- lgb.cv(params,
dtrain,
1000,
nfold = 5,
min_data = 1,
learning_rate = 0.3,
early_stopping_rounds = 10)
But the current version on master returned this error when I tried to run it.
Error in env$model$best_score <- best_score[i] :
cannot add bindings to a locked environment
@luyongxu The bug you have is unrelated to the current bug (your bug is a pure R bug, while my bug is a R/C++ wrapping bug or an issue on the C++ backend).
See https://github.com/Microsoft/LightGBM/pull/764 for a fix to your issue.
@Laurae2 I think the predict function in R can accept the additional parameters as well.
@cbecker With the help of @guolinke it now works.
It works if I use this to predict:
plot(predict(model, generated,
pred_early_stop = TRUE,
pred_early_stop_freq = 1,
pred_early_stop_margin = 0.1))
plot(predict(model, generated,
pred_early_stop = TRUE,
pred_early_stop_freq = 1,
pred_early_stop_margin = 1))
Please update lightgbm package, your problem will be resolved.
library(devtools) options(devtools.install.args = "--no-multiarch") # if you have 64-bit R only, you can skip this install_github("Microsoft/LightGBM", subdir = "R-package")
ping @cbecker @guolinke
OS: Windows Server 2012 R2 R 3.4.0 compiled with MinGW 7.1 LightGBM compiled with Visual Studio 2017
Prediction early stopping parameters are not working (or are not discoverable in R?)
Timings reported for 500 iterations on Bosch dataset:
Reproducible steps: