tobigithub / caret-machine-learning

Practical examples for the R caret machine learning package
MIT License
67 stars 50 forks source link

parRF and caret 6-058 no parallel run #8

Open tobigithub opened 8 years ago

tobigithub commented 8 years ago

as reported on stackexchange and over at caret the parRF (parallel random forest) method in caret does not run in parallel. Now some say its a bug in parRF because it uses MPI, some say its a bug in caret.

>  require(caret); data(BloodBrain); 
>   require(foreach)
>   fit2 <- train(bbbDescr, logBBB, "parRF"); 
>   fit2; fit2$times$everything
Parallel Random Forest 

208 samples
134 predictors

No pre-processing
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 208, 208, 208, 208, 208, 208, ... 
Resampling results across tuning parameters:

  mtry  RMSE       Rsquared   RMSE SD     Rsquared SD
    2   0.5347574  0.5559162  0.05277319  0.06222596 
   68   0.5350806  0.5361994  0.04816799  0.07307331 
  134   0.5474230  0.5132478  0.05231734  0.08475810 

RMSE was used to select the optimal model using  the smallest value.
The final value used for the model was mtry = 2. 
   user  system elapsed 
  54.31    0.01   54.68 
tobigithub commented 8 years ago

See solution similar to that from Steve Weston (Yale).

require(caret); library(doParallel); data(BloodBrain); 
cl <- makePSOCKcluster(detectCores()); clusterEvalQ(cl, library(foreach))
registerDoParallel(cl)
  fit2 <- train(bbbDescr, logBBB, "parRF");fit2; fit2$times$everything;
stopCluster(cl); registerDoSEQ();

#-------------------------------------------------------------------------------------
Loading required package: e1071
Loading required package: randomForest
randomForest 4.6-10
Type rfNews() to see new features/changes/bug fixes.
Parallel Random Forest 

208 samples
134 predictors

No pre-processing
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 208, 208, 208, 208, 208, 208, ... 
Resampling results across tuning parameters:

  mtry  RMSE       Rsquared   RMSE SD     Rsquared SD
    2   0.5298241  0.5632903  0.05212148  0.07324073 
   68   0.5392003  0.5274296  0.04519746  0.07125592 
  134   0.5507065  0.5086346  0.04911251  0.07934256 

RMSE was used to select the optimal model using  the smallest value.
The final value used for the model was mtry = 2. 
   user  system elapsed 
   0.67    0.09   23.65