Open yongfanbeta opened 6 years ago
I had exactly the same problem. Is linear regression not supported in this case?
@Victorfy @shakfu
Thank you for using MlBayesOpt. I had the same error in this package. For now, it's a bag of this package, so I will fix it in the next version.
I'm very sorry... Please wait for some time until I fix it, or I welcome your PULL REQUEST.
I also encountered the same problem! look forward to your next updates! Thanks!
Error in xgb.iter.update(fd$bst, fd$dtrain, iteration - 1, obj) : Invalid Parameter format for num_class expect int but value='NA' In addition: There were 50 or more warnings (use warnings() to see the first 50) Timing stopped at: 11.87 12.9 26.54
Thanks for the great package! Any update on this issue or workarounds for running xgb_opt with reg:linear?
I also encountered the same problem! look forward to your next updates! Thanks!
Error in xgb.iter.update(fd$bst, fd$dtrain, iteration - 1, obj) : Invalid Parameter format for num_class expect int but value='NA' In addition: There were 50 or more warnings (use warnings() to see the first 50) Timing stopped at: 11.87 12.9 26.54
I also encountered the same problem when I tried fitting a regression model. Have you figured out how to fix it?
Still remains the same problem for reg:linear.
You need to comment out the num_class = num_classes in the "#about classes" else
statement of the xgb_cv_opt
function . The statement goes:
if (grepl("logi", objectfun) == TRUE){
xgb_cv <- function(object_fun,
eval_met,
num_classes,
So if the objective function is binary:logistic
then it correctly uses the num_classes
object. However when the function does not correspond to logi
or binary:logistic
it uses the else
part which also contains the num_classes
object and reg:linear
doesn't use the num_classes
object.
The num_classes
object appears in both the if
and the else
part of the code. I pushed a git request to highlight the issue of where the error is occuring. However, I still get a warning message on a unrelated issue.
Running the following should solve the issue (however I have only checked it on the iris data set):
xgb_cv_opt <- function(data,
label,
objectfun,
evalmetric,
n_folds,
eta_range = c(0.1, 1L),
max_depth_range = c(4L, 6L),
nrounds_range = c(70, 160L),
subsample_range = c(0.1, 1L),
bytree_range = c(0.4, 1L),
init_points = 4,
n_iter = 10,
acq = "ei",
kappa = 2.576,
eps = 0.0,
optkernel = list(type = "exponential", power = 2),
classes = NULL,
seed = 0
)
{
if(class(data)[1] == "dgCMatrix")
{dtrain <- xgb.DMatrix(data,
label = label)
xg_watchlist <- list(msr = dtrain)
cv_folds <- KFold(label, nfolds = n_folds,
stratified = TRUE, seed = seed)
}
else
{
quolabel <- enquo(label)
datalabel <- (data %>% select(!! quolabel))[[1]]
mx <- sparse.model.matrix(datalabel ~ ., data)
if (class(datalabel) == "factor"){
dtrain <- xgb.DMatrix(mx, label = as.integer(datalabel) - 1)
} else{
dtrain <- xgb.DMatrix(mx, label = datalabel)
}
xg_watchlist <- list(msr = dtrain)
cv_folds <- KFold(datalabel, nfolds = n_folds,
stratified = TRUE, seed = seed)
}
#about classes
if (grepl("logi", objectfun) == TRUE){
xgb_cv <- function(object_fun,
eval_met,
num_classes,
eta_opt,
max_depth_opt,
nrounds_opt,
subsample_opt,
bytree_opt) {
object_fun <- objectfun
eval_met <- evalmetric
cv <- xgb.cv(params = list(booster = "gbtree",
nthread = 1,
objective = object_fun,
eval_metric = eval_met,
eta = eta_opt,
max_depth = max_depth_opt,
subsample = subsample_opt,
colsample_bytree = bytree_opt,
lambda = 1, alpha = 0),
data = dtrain, folds = cv_folds,
watchlist = xg_watchlist,
prediction = TRUE, showsd = TRUE,
early_stopping_rounds = 5, maximize = TRUE, verbose = 0,
nrounds = nrounds_opt)
if (eval_met %in% c("auc", "ndcg", "map")) {
s <- max(cv$evaluation_log[, 4])
} else {
s <- max(-(cv$evaluation_log[, 4]))
}
list(Score = s,
Pred = cv$pred)
}
} else{
xgb_cv <- function(object_fun,
eval_met,
num_classes,
eta_opt,
max_depth_opt,
nrounds_opt,
subsample_opt,
bytree_opt) {
object_fun <- objectfun
eval_met <- evalmetric
num_classes <- classes
cv <- xgb.cv(params = list(booster = "gbtree",
nthread = 1,
objective = object_fun,
#num_class = num_classes,
eval_metric = eval_met,
eta = eta_opt,
max_depth = max_depth_opt,
subsample = subsample_opt,
colsample_bytree = bytree_opt,
lambda = 1, alpha = 0),
data = dtrain, folds = cv_folds,
watchlist = xg_watchlist,
prediction = TRUE, showsd = TRUE,
early_stopping_rounds = 5, maximize = TRUE, verbose = 0,
nrounds = nrounds_opt)
if (eval_met %in% c("auc", "ndcg", "map")) {
s <- max(cv$evaluation_log[, 4])
} else {
s <- max(-(cv$evaluation_log[, 4]))
}
list(Score = s,
Pred = cv$pred)
}
}
opt_res <- BayesianOptimization(xgb_cv,
bounds = list(eta_opt = eta_range,
max_depth_opt = max_depth_range,
nrounds_opt = nrounds_range,
subsample_opt = subsample_range,
bytree_opt = bytree_range),
init_points,
init_grid_dt = NULL,
n_iter,
acq,
kappa,
eps,
optkernel,
verbose = TRUE)
return(opt_res)
}
library(MlBayesOpt)
library(dplyr)
library(Matrix)
library(xgboost)
library(rBayesianOptimization)
df <- iris
label_Species <- iris$Species
xgb_cv_opt(data = df,
label = label_Species,
objectfun = "reg:linear", evalmetric = "rmse", n_folds = 2, eta_range = c(0.1, 1L),
max_depth_range = c(4L, 6L), nrounds_range = c(70, 160L),
subsample_range = c(0.1, 1L), bytree_range = c(0.4, 1L),
init_points = 4, n_iter = 10, acq = "ucb", kappa = 2.576, eps = 0,
optkernel = list(type = "exponential", power = 2), classes = NULL,
seed = 0)
I get the following warning message:
Warning messages:
1: In matrix(c(sample(index), rep(NA, NA_how_many)), ncol = nfolds) :
data length [15] is not a sub-multiple or multiple of the number of rows [8]
2: In matrix(c(sample(index), rep(NA, NA_how_many)), ncol = nfolds) :
data length [43] is not a sub-multiple or multiple of the number of rows [22]
3: In matrix(c(sample(index), rep(NA, NA_how_many)), ncol = nfolds) :
data length [109] is not a sub-multiple or multiple of the number of rows [55]
4: In matrix(c(sample(index), rep(NA, NA_how_many)), ncol = nfolds) :
data length [107] is not a sub-multiple or multiple of the number of rows [54]
5: In matrix(c(sample(index), rep(NA, NA_how_many)), ncol = nfolds) :
data length [133] is not a sub-multiple or multiple of the number of rows [67]
Which I have located to this part of the code:
cv_folds <- KFold(datalabel, nfolds = n_folds,
stratified = TRUE, seed = seed)
I had this solved but lost the unsaved changes when I changed project in R. If I recall correctly I set the datalabel
or label
to a new numeric or Matrix.
hello,
When I use MIBayesOpt to optimize xgboost model to solve a linear regression problem like predict house price, I choose
objectfun = "reg:linear
, this is not a classification problem means noclasses
parameter, but it seems i have to give a num_class?hope for u reply!