ymattu / MlBayesOpt

R package to tune parameters for machine learning(Support Vector Machine, Random Forest, and Xgboost), using bayesian optimization with gaussian process
Other
45 stars 15 forks source link

Append Y to ranger data frame to avoid namespace errors, add test #38

Closed ck37 closed 7 years ago

ck37 commented 7 years ago

Hello,

This fixes the namespace bug I reported in rf_opt (#37). I don't know if there is a better way to work around this, but I've fixed the issue by cbinding Y onto the X dataframe. Ranger doesn't seem to work with Y as a separate vector.

I also added the RF example and a Boston example as minimal tests. The RF example runs into another error with GP_deviance() unfortunately.

Thanks, Chris

ymattu commented 7 years ago

Hello, Chris. Thanks for your bug fix.

I checked your rf_opt() function and it worked.

Probably the test code fails because parameters are addicted to local solutions. In such a case, we should increase the kappa parameter of the Acquisition Function. I set kappa 10(default is 2.576), then Iris test and your Boston test worked. Please try it.

Iris test

mod <- rf_opt(
+   train_data = iris_train,
+   train_label = iris_train$Species,
+   test_data = iris_test,
+   test_label = iris_test$Species,
+   mtry_range = c(1L, 4L),
+   kappa = 10
+   )
elapsed = 0.01  Round = 1   num_trees_opt = 271.0000    mtry_opt = 3.0000   Value = 1.0000 
elapsed = 0.01  Round = 2   num_trees_opt = 90.0000 mtry_opt = 4.0000   Value = 1.0000 
elapsed = 0.02  Round = 3   num_trees_opt = 525.0000    mtry_opt = 4.0000   Value = 1.0000 
elapsed = 0.03  Round = 4   num_trees_opt = 864.0000    mtry_opt = 3.0000   Value = 1.0000 
elapsed = 0.04  Round = 5   num_trees_opt = 795.0000    mtry_opt = 4.0000   Value = 1.0000 
elapsed = 0.01  Round = 6   num_trees_opt = 390.0000    mtry_opt = 4.0000   Value = 1.0000 
elapsed = 0.02  Round = 7   num_trees_opt = 420.0000    mtry_opt = 3.0000   Value = 1.0000 
elapsed = 0.02  Round = 8   num_trees_opt = 318.0000    mtry_opt = 2.0000   Value = 1.0000 
elapsed = 0.01  Round = 9   num_trees_opt = 222.0000    mtry_opt = 1.0000   Value = 1.0000 
elapsed = 0.01  Round = 10  num_trees_opt = 160.0000    mtry_opt = 3.0000   Value = 1.0000 
elapsed = 0.03  Round = 11  num_trees_opt = 791.0000    mtry_opt = 2.0000   Value = 1.0000 
elapsed = 0.01  Round = 12  num_trees_opt = 301.0000    mtry_opt = 4.0000   Value = 1.0000 
elapsed = 0.04  Round = 13  num_trees_opt = 935.0000    mtry_opt = 3.0000   Value = 1.0000 
elapsed = 0.01  Round = 14  num_trees_opt = 356.0000    mtry_opt = 2.0000   Value = 1.0000 
elapsed = 0.01  Round = 15  num_trees_opt = 393.0000    mtry_opt = 3.0000   Value = 1.0000 
elapsed = 0.03  Round = 16  num_trees_opt = 674.0000    mtry_opt = 4.0000   Value = 1.0000 
elapsed = 0.04  Round = 17  num_trees_opt = 906.0000    mtry_opt = 2.0000   Value = 1.0000 
elapsed = 0.02  Round = 18  num_trees_opt = 529.0000    mtry_opt = 1.0000   Value = 1.0000 
elapsed = 0.00  Round = 19  num_trees_opt = 2.0000  mtry_opt = 1.0000   Value = 0.9600 
elapsed = 0.01  Round = 20  num_trees_opt = 345.0000    mtry_opt = 4.0000   Value = 1.0000 
elapsed = 0.01  Round = 21  num_trees_opt = 147.0000    mtry_opt = 4.0000   Value = 1.0000 

 Best Parameters Found: 
Round = 1   num_trees_opt = 271.0000    mtry_opt = 3.0000   Value = 1.0000 

Boston test

> set.seed(71)
> res1 <- rf_opt(train_data = x_train,
+                train_label = y_train,
+                test_data = x_test,
+                test_label = y_test,
+                mtry_range = c(1L, ncol(x_train)),
+                # Doesn't work:
+                #num_tree_range = c(500L, 500L)
+                num_tree_range = c(500L, 501L),
+                kappa = 10
+ )
elapsed = 0.08  Round = 1   num_trees_opt = 501.0000    mtry_opt = 8.0000   Value = 0.7941 
elapsed = 0.07  Round = 2   num_trees_opt = 501.0000    mtry_opt = 10.0000  Value = 0.8039 
elapsed = 0.08  Round = 3   num_trees_opt = 501.0000    mtry_opt = 12.0000  Value = 0.8039 
elapsed = 0.05  Round = 4   num_trees_opt = 500.0000    mtry_opt = 4.0000   Value = 0.7843 
elapsed = 0.07  Round = 5   num_trees_opt = 500.0000    mtry_opt = 13.0000  Value = 0.7941 
elapsed = 0.06  Round = 6   num_trees_opt = 501.0000    mtry_opt = 4.0000   Value = 0.7941 
elapsed = 0.06  Round = 7   num_trees_opt = 501.0000    mtry_opt = 9.0000   Value = 0.8039 
elapsed = 0.05  Round = 8   num_trees_opt = 500.0000    mtry_opt = 6.0000   Value = 0.8039 
elapsed = 0.07  Round = 9   num_trees_opt = 500.0000    mtry_opt = 11.0000  Value = 0.7843 
elapsed = 0.06  Round = 10  num_trees_opt = 501.0000    mtry_opt = 9.0000   Value = 0.7843 
elapsed = 0.07  Round = 11  num_trees_opt = 501.0000    mtry_opt = 11.0000  Value = 0.8039 
elapsed = 0.06  Round = 12  num_trees_opt = 500.0000    mtry_opt = 8.0000   Value = 0.7843 
elapsed = 0.05  Round = 13  num_trees_opt = 500.0000    mtry_opt = 2.0000   Value = 0.8039 
elapsed = 0.05  Round = 14  num_trees_opt = 500.0000    mtry_opt = 4.0000   Value = 0.7843 
elapsed = 0.06  Round = 15  num_trees_opt = 500.0000    mtry_opt = 8.0000   Value = 0.7941 
elapsed = 0.05  Round = 16  num_trees_opt = 500.0000    mtry_opt = 3.0000   Value = 0.7843 
elapsed = 0.06  Round = 17  num_trees_opt = 501.0000    mtry_opt = 7.0000   Value = 0.7843 
elapsed = 0.06  Round = 18  num_trees_opt = 501.0000    mtry_opt = 11.0000  Value = 0.8039 
elapsed = 0.07  Round = 19  num_trees_opt = 501.0000    mtry_opt = 13.0000  Value = 0.7941 
elapsed = 0.06  Round = 20  num_trees_opt = 501.0000    mtry_opt = 6.0000   Value = 0.7843 
elapsed = 0.06  Round = 21  num_trees_opt = 501.0000    mtry_opt = 9.0000   Value = 0.7843 

 Best Parameters Found: 
Round = 2   num_trees_opt = 501.0000    mtry_opt = 10.0000  Value = 0.8039 

Appreciate, Yuya

ymattu commented 7 years ago

Iris test seems to be unstable... The error strongly depends on set.seed and kappa.