Closed germayneng closed 7 years ago
Please provide a reproducible example
On Mon, May 29, 2017 at 12:20 AM, Germayne notifications@github.com wrote:
Hi,
I am trying to do stacking using the caret list. It is a classification response data. From what i understand, RF requires the response to be factor while xgboost needs the response as numeric.
I tried both scenarios, converting to factor and run the caretList as well as converting to numeric and run the caretList. I got both to run.
The latter gives me warnings from RF since it requires the response to be a factor. So what is right?
Regards Germayne
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/zachmayer/caretEnsemble/issues/229, or mute the thread https://github.com/notifications/unsubscribe-auth/AAjf1iGqKCHZ8l2MHoF0IkISml-fJTN_ks5r-keUgaJpZM4No4yj .
@zachmayer
sorry i think my question should be: for xgboost itself, if i am doing a binary logistic, do I leave the response variable as a class factor? Because for random forest and extra trees, they require the response variable as class factor but the normal xgboost that I know only takes in numeric class variables
example code:
control <- trainControl(method="repeatedcv", number=10, repeats=3, savePredictions= "final", classProbs=TRUE, summaryFunction = LogLosSummary)
algorithmList <- c('glm','knn')
# set grids
#rf_grid <- expand.grid()
xgb_grid <- expand.grid(nrounds = 1000, eta = 0.1, max_depth = 5, gamma = 0, colsample_bytree = 1, min_child_weight = 1, subsample = 1)
# methodList=algorithmList
models <- caretList(Class~., data=dataset, trControl=control, metric = "LogLoss",
methodList = algorithmList,
tuneList = list(
et = caretModelSpec(method = "extraTrees", ntree = 1000),
rf = caretModelSpec(method = "rf", ntree = 1000),
xgb = caretModelSpec(method = "xgbTree", tuneGrid = xgb_grid)
)
)
Try using a factor. Caret should convert it to numeric before passing the data to xgboost.
Sent from my iPhone
On May 30, 2017, at 10:25 PM, Germayne notifications@github.com wrote:
@zachmayer
sorry i think my question should be: for xgboost itself, if i am doing a binary logistic, do I leave the response variable as a class factor? Because for random forest and extra trees, they require the response variable as class factor but the normal xgboost that I know only takes in numeric class variables
example code:
control <- trainControl(method="repeatedcv", number=10, repeats=3, savePredictions= "final", classProbs=TRUE, summaryFunction = LogLosSummary) algorithmList <- c('glm','knn')
set grids
rf_grid <- expand.grid()
xgb_grid <- expand.grid(nrounds = 1000, eta = 0.1, max_depth = 5, gamma = 0, colsample_bytree = 1, min_child_weight = 1, subsample = 1)
methodList=algorithmList
models <- caretList(Class~., data=dataset, trControl=control, metric = "LogLoss", methodList = algorithmList, tuneList = list( et = caretModelSpec(method = "extraTrees", ntree = 1000), rf = caretModelSpec(method = "rf", ntree = 1000), xgb = caretModelSpec(method = "xgbTree", tuneGrid = xgb_grid) ) )
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
thank you. :) This cleared my doubts
Hi,
I am trying to do stacking using the caret list. It is a classification response data. From what i understand, RF requires the response to be factor while xgboost needs the response as numeric.
I tried both scenarios, converting to factor and run the caretList as well as converting to numeric and run the caretList. I got both to run.
The latter gives me warnings from RF since it requires the response to be a factor. So what is right?
Regards Germayne