topepo / caret

caret (Classification And Regression Training) R package that contains misc functions for training and plotting classification and regression models
http://topepo.github.io/caret/index.html
1.6k stars 637 forks source link

evtree performance drop with caret_6.0-52 under WINx64. #281

Closed tobigithub closed 8 years ago

tobigithub commented 8 years ago

Hi, I wonder what happened to the evtree method, training with evtree now takes double amount of time and for some of the larger data sets its also not parallelized anymore and just computes on 1 core even if more are cores are available. Thank you Tobias

topepo commented 8 years ago

We will need reproducible examples to test with.

tobigithub commented 8 years ago

Hi, Similar to the Microsoft Stress Team who keeps testing Hundreds/Thousands of different computers with Hundreds of packages I keep a statistics with all the run-times of all caret models via modelFit$times$everything so I can see what methods change with updates and upgrades.

So for the iris problem there was no significant time change depending on caret, I did not change the evtree method version, which in this case does not matter. This test is only for the three class problem, not for the two class version. Three manual repeats, no changes, hence topic closed.

#   caret_6.0-58 and evtree_1.0-0 
#   user  system elapsed 
#   1.20    0.03    9.42 

#   caret_6.0-52 and evtree_1.0-0 
#   user  system elapsed 
#   2.23    0.05   10.95 

#   caret_6.0-47 and evtree_1.0-0 
#   user  system elapsed 
#   2.22    0.00   10.56 

code I used

# performance drop in evtree method 
# caret_6.0-52 under WINx64.
# Tobias Kind (2015)

require(caret)
require(partykit)
require(grid)
require(evtree)

data(iris)
sessionInfo()

#-----------------------------------------------------------
# Library parallel() is a native R library, no CRAN required
library(parallel)
nCores <- detectCores(logical = FALSE)
nThreads <- detectCores(logical = TRUE)
# cat("CPU with",nCores,"cores and",nThreads,"threads detected.\n")

# load the doParallel/doSNOW library for caret cluster use
library(doParallel)
cl <- makeCluster(nThreads)
registerDoParallel(cl)

#-----------------------------------------------------------

#Three class problem
TrainData <- iris[,1:4]
TrainClasses <- iris[,5]

rfFit <- train(TrainData, TrainClasses, method = "evtree",
                preProcess = c("center", "scale"),
                tuneLength = 1,
                trControl = trainControl(method = "cv"))

rfFit
confusionMatrix(rfFit)
rfFit$times$everything

#   caret_6.0-52 and evtree_1.0-0 
#   user  system elapsed 
#   2.23    0.05   10.95 

#   caret_6.0-47 and evtree_1.0-0 
#   user  system elapsed 
#   2.22    0.00   10.56 

#------------------------------------------------------------
stopCluster(cl)
registerDoSEQ()
### END
tobigithub commented 8 years ago

Hi,
ok, now for the two class problem with caret I observed a 6-fold speed increase, so I tested caret_6.0-52 and also went to caret_6.0-58 and there is a speed increase for the classification, not sure what it was, certainly a two class problem with less cases should not take longer than a 3-class problem, so this got fixed with version caret_6.0-58. Case closed.

#   caret_6.0-58 and evtree_1.0-0 
#   user  system elapsed 
#   0.84    0.01    6.85 

#   caret_6.0-47 and evtree_1.0-0 
#   user  system elapsed 
#   10.31    0.02   38.28 

code I used

# WAS: performance drop in evtree method 
# Tobias Kind (2015)

require(caret)
require(partykit)
require(grid)
require(evtree)

data(iris)
sessionInfo()

#-----------------------------------------------------------
# Library parallel() is a native R library, no CRAN required
library(parallel)
nCores <- detectCores(logical = FALSE)
nThreads <- detectCores(logical = TRUE)
# cat("CPU with",nCores,"cores and",nThreads,"threads detected.\n")

# load the doParallel/doSNOW library for caret cluster use
library(doParallel)
cl <- makeCluster(nThreads)
registerDoParallel(cl)

#-----------------------------------------------------------

# use TrainData <- iris[1:100,1:4] for two classes 
# then drop unused labels from factor
TrainData <- iris[1:100,1:4]
TrainClasses <- factor(iris[1:100,5],exclude=NULL)
summary(TrainClasses)

rfFit <- train(TrainData, TrainClasses, method = "evtree",
                preProcess = c("center", "scale"),
                tuneLength = 1,
                trControl = trainControl(method = "cv"))

rfFit
confusionMatrix(rfFit)
rfFit$times$everything

#   caret_6.0-58 and evtree_1.0-0 
#   user  system elapsed 
#   0.84    0.01    6.85 

#   caret_6.0-47 and evtree_1.0-0 
#   user  system elapsed 
#   10.31    0.02   38.28 

#------------------------------------------------------------
stopCluster(cl)
registerDoSEQ()
### END

Cheers Tobias