marjoleinF / pre

an R package for deriving Prediction Rule Ensembles
58 stars 17 forks source link

Hyperparameter tuning #24

Closed markhwhiteii closed 4 years ago

markhwhiteii commented 4 years ago

A number of hyperparameters are set in the pre question, such as sampfrac, maxdepth, learnrate, mtry, ntrees.

Is there a way to extract the overall RMSE, ROC AUC, etc., from objects of class pre? That way, it would be amenable to do something like grid search for these hyperparameters using the rsample and purrr packages, for example.

If mod is a an object of class pre, trying to call something like glmnet::assess.glmnet(mod$glmnet.fit) produces the following error:

Error in cbind2(1, newx) %*% nbeta : 
  Cholmod error 'X and/or Y have wrong dimensions' at file ../MatrixOps/cholmod_sdmult.c, line 90
marjoleinF commented 4 years ago

You can do hyperparameter tuning using function train from package caret, see ?caret_pre_model (this will hopefully be implemented as a method in caret in the future). With it, you can tune the following parameters:

> caret_pre_model$parameters
        parameter     class                          label
1        sampfrac   numeric           Subsampling Fraction
2        maxdepth   numeric                 Max Tree Depth
3       learnrate   numeric                      Shrinkage
4            mtry   numeric # Randomly Selected Predictors
5        use.grad   logical       Employ Gradient Boosting
6 penalty.par.val character       Regularization Parameter

Function assess.glmnet requires specification of new data. At least, if I perform the examples from the documentation of assess.glmnet without specifying the newx and newy arguments, I get a similar error.

You can apply assess.glmnet to the training data as follows:

## Load packages
library("pre")
library("glmnet")
## Fit pre to a continuous response:
airq <- airquality[complete.cases(airquality), ]
set.seed(42)
mod <- pre(Ozone ~ ., data = airq)
assess.glmnet(mod$glmnet.fit, newx = mod$modmat, 
              newy = mod$data$Ozone)

Function pre performs some data preparation internally, so doing this with new test observations will be more involved.