zachmayer / caretEnsemble

caret models all the way down :turtle:
Other
226 stars 75 forks source link

Response as factor or numeric ( rf and xgbTree ) #229

Closed germayneng closed 7 years ago

germayneng commented 7 years ago

Hi,

I am trying to do stacking using the caret list. It is a classification response data. From what i understand, RF requires the response to be factor while xgboost needs the response as numeric.

I tried both scenarios, converting to factor and run the caretList as well as converting to numeric and run the caretList. I got both to run.

The latter gives me warnings from RF since it requires the response to be a factor. So what is right?

Regards Germayne

zachmayer commented 7 years ago

Please provide a reproducible example

On Mon, May 29, 2017 at 12:20 AM, Germayne notifications@github.com wrote:

Hi,

I am trying to do stacking using the caret list. It is a classification response data. From what i understand, RF requires the response to be factor while xgboost needs the response as numeric.

I tried both scenarios, converting to factor and run the caretList as well as converting to numeric and run the caretList. I got both to run.

The latter gives me warnings from RF since it requires the response to be a factor. So what is right?

Regards Germayne

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/zachmayer/caretEnsemble/issues/229, or mute the thread https://github.com/notifications/unsubscribe-auth/AAjf1iGqKCHZ8l2MHoF0IkISml-fJTN_ks5r-keUgaJpZM4No4yj .

germayneng commented 7 years ago

@zachmayer

sorry i think my question should be: for xgboost itself, if i am doing a binary logistic, do I leave the response variable as a class factor? Because for random forest and extra trees, they require the response variable as class factor but the normal xgboost that I know only takes in numeric class variables

example code:


control <- trainControl(method="repeatedcv", number=10, repeats=3, savePredictions= "final", classProbs=TRUE, summaryFunction = LogLosSummary)
algorithmList <- c('glm','knn')

# set grids 
#rf_grid <- expand.grid()
xgb_grid <- expand.grid(nrounds = 1000, eta = 0.1, max_depth = 5, gamma = 0, colsample_bytree = 1, min_child_weight = 1, subsample = 1)

# methodList=algorithmList
models <- caretList(Class~., data=dataset, trControl=control, metric = "LogLoss",
                    methodList = algorithmList,
                    tuneList = list(
                      et = caretModelSpec(method = "extraTrees", ntree = 1000),
                      rf = caretModelSpec(method = "rf", ntree = 1000),
                      xgb = caretModelSpec(method = "xgbTree", tuneGrid = xgb_grid)
                                   )
                    )
zachmayer commented 7 years ago

Try using a factor. Caret should convert it to numeric before passing the data to xgboost.

Sent from my iPhone

On May 30, 2017, at 10:25 PM, Germayne notifications@github.com wrote:

@zachmayer

sorry i think my question should be: for xgboost itself, if i am doing a binary logistic, do I leave the response variable as a class factor? Because for random forest and extra trees, they require the response variable as class factor but the normal xgboost that I know only takes in numeric class variables

example code:

control <- trainControl(method="repeatedcv", number=10, repeats=3, savePredictions= "final", classProbs=TRUE, summaryFunction = LogLosSummary) algorithmList <- c('glm','knn')

set grids

rf_grid <- expand.grid()

xgb_grid <- expand.grid(nrounds = 1000, eta = 0.1, max_depth = 5, gamma = 0, colsample_bytree = 1, min_child_weight = 1, subsample = 1)

methodList=algorithmList

models <- caretList(Class~., data=dataset, trControl=control, metric = "LogLoss", methodList = algorithmList, tuneList = list( et = caretModelSpec(method = "extraTrees", ntree = 1000), rf = caretModelSpec(method = "rf", ntree = 1000), xgb = caretModelSpec(method = "xgbTree", tuneGrid = xgb_grid) ) )

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

germayneng commented 7 years ago

thank you. :) This cleared my doubts