ymattu / MlBayesOpt

R package to tune parameters for machine learning(Support Vector Machine, Random Forest, and Xgboost), using bayesian optimization with gaussian process
Other
45 stars 15 forks source link

Random Forest example fails #37

Closed ck37 closed 7 years ago

ck37 commented 7 years ago

Hello,

I'm trying the random forest example shown on the readme and running into an error - any ideas?

> library(MlBayesOpt)
> set.seed(123)
> mod <- rf_opt(
+   train_data = iris_train,
+   train_label = iris_train$Species,
+   test_data = iris_test,
+   test_label = iris_test$Species,
+   mtry_range = c(1L, 4L)
+ )
 Error in eval(f[[2]], envir = data) : object 'trainlabel' not found Timing stopped at: 0.196 0.003 0.2
> traceback()
15: eval(f[[2]], envir = data)
14: eval(f[[2]], envir = data)
13: data.frame(eval(f[[2]], envir = data))
12: parse.formula(formula, data)
11: ranger(trainlabel ~ ., dtrain, num.trees = num_trees_opt, mtry = mtry_opt)
10: (function (num_trees_opt, mtry_opt) 
    {
        model <- ranger(trainlabel ~ ., dtrain, num.trees = num_trees_opt, 
            mtry = mtry_opt)
        t.pred <- predict(model, dat = dtest)
        Pred <- sum(diag(table(testlabel, t.pred$predictions)))/nrow(dtest)
        list(Score = Pred, Pred = Pred)
    })(num_trees_opt = 288, mtry_opt = 4)
9: do.call(what = FUN, args = as.list(This_Par))
8: system.time({
       This_Score_Pred <- do.call(what = FUN, args = as.list(This_Par))
   })
7: eval(expr, pf)
6: eval(expr, pf)
5: withVisible(eval(expr, pf))
4: evalVis(expr)
3: utils::capture.output({
       This_Time <- system.time({
           This_Score_Pred <- do.call(what = FUN, args = as.list(This_Par))
       })
   })
2: BayesianOptimization(rf_holdout, bounds = list(num_trees_opt = num_tree_range, 
       mtry_opt = mtry_range), init_points, init_grid_dt = NULL, 
       n_iter, acq, kappa, eps, verbose = TRUE)
1: rf_opt(train_data = iris_train, train_label = iris_train$Species, 
       test_data = iris_test, test_label = iris_test$Species, mtry_range = c(1L, 
           4L))

Thanks, Chris

ymattu commented 7 years ago

This comment is the same as PR #38 Hello, Chris. Thanks for your bug fix.

I checked your rf_opt() function and it worked.

Probably the test code fails because parameters are addicted to local solutions. In such a case, we should increase the kappa parameter of the Acquisition Function. I set kappa 10(default is 2.576), then Iris test and your Boston test worked. Please try it.

Iris test

mod <- rf_opt(
+   train_data = iris_train,
+   train_label = iris_train$Species,
+   test_data = iris_test,
+   test_label = iris_test$Species,
+   mtry_range = c(1L, 4L),
+   kappa = 10
+   )
elapsed = 0.01  Round = 1   num_trees_opt = 271.0000    mtry_opt = 3.0000   Value = 1.0000 
elapsed = 0.01  Round = 2   num_trees_opt = 90.0000 mtry_opt = 4.0000   Value = 1.0000 
elapsed = 0.02  Round = 3   num_trees_opt = 525.0000    mtry_opt = 4.0000   Value = 1.0000 
elapsed = 0.03  Round = 4   num_trees_opt = 864.0000    mtry_opt = 3.0000   Value = 1.0000 
elapsed = 0.04  Round = 5   num_trees_opt = 795.0000    mtry_opt = 4.0000   Value = 1.0000 
elapsed = 0.01  Round = 6   num_trees_opt = 390.0000    mtry_opt = 4.0000   Value = 1.0000 
elapsed = 0.02  Round = 7   num_trees_opt = 420.0000    mtry_opt = 3.0000   Value = 1.0000 
elapsed = 0.02  Round = 8   num_trees_opt = 318.0000    mtry_opt = 2.0000   Value = 1.0000 
elapsed = 0.01  Round = 9   num_trees_opt = 222.0000    mtry_opt = 1.0000   Value = 1.0000 
elapsed = 0.01  Round = 10  num_trees_opt = 160.0000    mtry_opt = 3.0000   Value = 1.0000 
elapsed = 0.03  Round = 11  num_trees_opt = 791.0000    mtry_opt = 2.0000   Value = 1.0000 
elapsed = 0.01  Round = 12  num_trees_opt = 301.0000    mtry_opt = 4.0000   Value = 1.0000 
elapsed = 0.04  Round = 13  num_trees_opt = 935.0000    mtry_opt = 3.0000   Value = 1.0000 
elapsed = 0.01  Round = 14  num_trees_opt = 356.0000    mtry_opt = 2.0000   Value = 1.0000 
elapsed = 0.01  Round = 15  num_trees_opt = 393.0000    mtry_opt = 3.0000   Value = 1.0000 
elapsed = 0.03  Round = 16  num_trees_opt = 674.0000    mtry_opt = 4.0000   Value = 1.0000 
elapsed = 0.04  Round = 17  num_trees_opt = 906.0000    mtry_opt = 2.0000   Value = 1.0000 
elapsed = 0.02  Round = 18  num_trees_opt = 529.0000    mtry_opt = 1.0000   Value = 1.0000 
elapsed = 0.00  Round = 19  num_trees_opt = 2.0000  mtry_opt = 1.0000   Value = 0.9600 
elapsed = 0.01  Round = 20  num_trees_opt = 345.0000    mtry_opt = 4.0000   Value = 1.0000 
elapsed = 0.01  Round = 21  num_trees_opt = 147.0000    mtry_opt = 4.0000   Value = 1.0000 

 Best Parameters Found: 
Round = 1   num_trees_opt = 271.0000    mtry_opt = 3.0000   Value = 1.0000 

Boston test

> set.seed(71)
> res1 <- rf_opt(train_data = x_train,
+                train_label = y_train,
+                test_data = x_test,
+                test_label = y_test,
+                mtry_range = c(1L, ncol(x_train)),
+                # Doesn't work:
+                #num_tree_range = c(500L, 500L)
+                num_tree_range = c(500L, 501L),
+                kappa = 10
+ )
elapsed = 0.08  Round = 1   num_trees_opt = 501.0000    mtry_opt = 8.0000   Value = 0.7941 
elapsed = 0.07  Round = 2   num_trees_opt = 501.0000    mtry_opt = 10.0000  Value = 0.8039 
elapsed = 0.08  Round = 3   num_trees_opt = 501.0000    mtry_opt = 12.0000  Value = 0.8039 
elapsed = 0.05  Round = 4   num_trees_opt = 500.0000    mtry_opt = 4.0000   Value = 0.7843 
elapsed = 0.07  Round = 5   num_trees_opt = 500.0000    mtry_opt = 13.0000  Value = 0.7941 
elapsed = 0.06  Round = 6   num_trees_opt = 501.0000    mtry_opt = 4.0000   Value = 0.7941 
elapsed = 0.06  Round = 7   num_trees_opt = 501.0000    mtry_opt = 9.0000   Value = 0.8039 
elapsed = 0.05  Round = 8   num_trees_opt = 500.0000    mtry_opt = 6.0000   Value = 0.8039 
elapsed = 0.07  Round = 9   num_trees_opt = 500.0000    mtry_opt = 11.0000  Value = 0.7843 
elapsed = 0.06  Round = 10  num_trees_opt = 501.0000    mtry_opt = 9.0000   Value = 0.7843 
elapsed = 0.07  Round = 11  num_trees_opt = 501.0000    mtry_opt = 11.0000  Value = 0.8039 
elapsed = 0.06  Round = 12  num_trees_opt = 500.0000    mtry_opt = 8.0000   Value = 0.7843 
elapsed = 0.05  Round = 13  num_trees_opt = 500.0000    mtry_opt = 2.0000   Value = 0.8039 
elapsed = 0.05  Round = 14  num_trees_opt = 500.0000    mtry_opt = 4.0000   Value = 0.7843 
elapsed = 0.06  Round = 15  num_trees_opt = 500.0000    mtry_opt = 8.0000   Value = 0.7941 
elapsed = 0.05  Round = 16  num_trees_opt = 500.0000    mtry_opt = 3.0000   Value = 0.7843 
elapsed = 0.06  Round = 17  num_trees_opt = 501.0000    mtry_opt = 7.0000   Value = 0.7843 
elapsed = 0.06  Round = 18  num_trees_opt = 501.0000    mtry_opt = 11.0000  Value = 0.8039 
elapsed = 0.07  Round = 19  num_trees_opt = 501.0000    mtry_opt = 13.0000  Value = 0.7941 
elapsed = 0.06  Round = 20  num_trees_opt = 501.0000    mtry_opt = 6.0000   Value = 0.7843 
elapsed = 0.06  Round = 21  num_trees_opt = 501.0000    mtry_opt = 9.0000   Value = 0.7843 

 Best Parameters Found: 
Round = 2   num_trees_opt = 501.0000    mtry_opt = 10.0000  Value = 0.8039 

However, the Iris test seems tobe very unstable. I will change the default example in README....

Appreciate, Yuya