Closed SerigneCisse closed 7 years ago
@wxchan
@guolinke I am not familiar with R-package. Maybe @Laurae2 can help.
Assuming the first metric and the first validation dataset are the ones used for early stopping, and assuming it is a metric minimization task, with the variable containing the model named model
:
min(as.numeric(unlist(model$record_evals[[2]][[1]])))
# or more simply with best_iter
as.numeric(unlist(model$record_evals[[2]][[1]]))[model$best_iter]
# or again more simply with best_iter
model$record_evals[[2]][[1]][[1]][[model$best_iter]]
should do the task.
@guolinke Do you know where to add it in the R-package? (and how to know if it is minimization or maximization task?)
Example code:
library(lightgbm)
data(agaricus.train, package = "lightgbm")
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
data(agaricus.test, package = "lightgbm")
test <- agaricus.test
dtest <- lgb.Dataset.create.valid(dtrain, test$data, label = test$label)
params <- list(objective = "regression", metric = "l2")
valids <- list(test = dtest)
model <- lgb.train(params,
dtrain,
100,
valids,
min_data = 1,
learning_rate = 1,
early_stopping_rounds = 10)
which.min(as.numeric(unlist(model$record_evals[[2]][[1]]))) # aka best_iter
min(as.numeric(unlist(model$record_evals[[2]][[1]]))) # aka best_score
as.numeric(unlist(model$record_evals[[2]][[1]]))[model$best_iter] # another way with best_iter
model$record_evals[[2]][[1]][[1]][[model$best_iter]] # probably simpler way with best_iter
Hi Laurae
I tried the last one of your example and it works well for both minimization (binary log-loss) and maximization (F1 score).
However, it gave me by default the score of the training set (and not the validation set). So I changed the valids
argument of lgb.train
from this : valids <- list(train=dtrain,eval = dval)
to this : valids <- list(eval = dval)
in order to have the right output.
Thank you very much. And thanks all the community for your amazing work.
@Laurae2 The best_iter is set in this line: https://github.com/Microsoft/LightGBM/blob/master/R-package/R/callback.R#L353 We can add best_score into Booster and set it here .
btw @guolinke and @wxchan
Do you know if the python version has a best_score attribut ? or would the code given by Laurae work on python ? ( replacing $ by .)
Thanks , that seems to work (at least I can see the score ) However ( and sorry for the novice question ) that is in dict format. My goal is to get the (floating) best_score from each validation fold then average them outside the loop
So I tried this
`cv_sum = 0
looping through folds:
cv_score = clf.best_score[valid_set_name]
cv_sum = cv_sum + cv_score
score = cv_sum / folds`
But that work (because of the dict format ?)
Hi! I am trying to use this wonderful package for the first time in R. It's really fast and it works fine.
However I was not able to get the score for early stopping round when I tried model$best_score for a classification task . Could anybody help me in this issue ? Thanks in advance