topepo / caret

caret (Classification And Regression Training) R package that contains misc functions for training and plotting classification and regression models
http://topepo.github.io/caret/index.html
1.62k stars 632 forks source link

Issue with multiprocessing #63

Closed trufanov-nok closed 9 years ago

trufanov-nok commented 10 years ago

Hi, I've noticed some issue with multiprocessing in caret (it's my first attempt to use this package). My PC is Ubuntu 14.x 64-bit, 4-cores CPU. R implementation is Revolution R Open 8.0 beta The problem could be reproduced on one of kaggle competitions data: http://www.kaggle.com/c/afsis-soil-properties/data The code is:

dataset <- read.csv("training.csv", na.strings=c(".", "NA", "", "?"), strip.white=TRUE, encoding="UTF-8",header=TRUE,stringsAsFactors=FALSE)
data <- dataset[,-c(1,3595:3600)]
library(caret)
idx <- createDataPartition(dataset[,"P"], 1, 0.75)
d_train <- data[idx$Resample1,]
d_train_P <- dataset[idx$Resample1, "P"]
tc <- trainControl("repeatedcv", 3, 1)

# library(doMC)
# registerDoMC(3)
res <- train(d_train, d_train_P, "svmLinear", trControl=tc )
res <- train(d_train, d_train_P, "svmLinear", trControl=tc )

If you leave doMC lines commented than both train() commands are executed and completed in a correct way. In case doMC lines uncommented the first train() launch 3 rsession processes for each core which consumes 25% of CPU and completes successfully. The second one do the same but it's 3 rsession processes never completes. Eventually CPU usage for all of them drops to 0%, but they don't terminate. So app waits for them forever although the computations were already done.

The same problem could be observed with following code too:

res <- train(d_train, d_train_P, "svmLinear", trControl=tc )
library(doMC)
registerDoMC(3)
res <- train(d_train, d_train_P, "svmLinear", trControl=tc ) # this command never ends

P.S. I'm using RStudio's session\restart menu comand to reset R state. P.P.S. The problem is reproducible with registerDoMC(2), but disappears with registerDoMC(1)

zachmayer commented 10 years ago

Could you dput() a sample of your dataset and post a self-contained example that generates the error? Is it possible you don't have enough memory on the machine to fit 3 models in parallel, but 2 is ok? svm's can be very memory hungry. The following code works fine for me in a fresh R session (R: 3.1.1, caret_6.0-35, kernlab_0.9-19, doMC_1.3.3)

library(caret)
library(kernlab)

d_train <- iris[,-5]
d_train_P <- iris[,5]
tc <- trainControl("repeatedcv", 3, 1)

res <- train(d_train, d_train_P, "svmLinear", trControl=tc)
library(doMC)
registerDoMC(8)
res <- train(d_train, d_train_P, "svmLinear", trControl=tc) 
trufanov-nok commented 10 years ago

You sample works for me too. I have enough RAM for 4 rsessions (8Gb) - svm takes slightly more 1Gb for each model. In a few hours (currently I do some computations on 3 cores) I'll try to reproduce the problem again with slightly smaller sample subset and provide dput()'ed objects.

zachmayer commented 10 years ago

Thanks. It could be something particular about the dataset. If you can make a small example that runs in a few seconds and re-creates the bug, I can add a unit test so it won't happen again. Have you tried another parallel backend?

topepo commented 10 years ago

I would also suggest doing testing without RStudio. I use it and love it but there have been some times where their settings can confuse diagnosing the issue.

The results of sessionInfo() would also help.

On Sun, Oct 19, 2014 at 12:31 PM, Zach Mayer notifications@github.com wrote:

Thanks. It could be something particular about the dataset. If you can make a small example that runs in a few seconds and re-creates the bug, I can add a unit test so it won't happen again.

— Reply to this email directly or view it on GitHub https://github.com/topepo/caret/issues/63#issuecomment-59655351.

trufanov-nok commented 10 years ago

I was able to reproduce it without RStudio. Unfortunately, I did that in Revolution R Open 8.0 beta, not vanilla R. If you wont be able to confirm the problem on your side with a code below then I'll try to switch back to vanilla R somehow (don't know how to do that yet).

Code used:

dataset <- read.csv("file:///home/truf/ml/afsis/training.csv", na.strings=c(".", "NA", "", "?"), strip.white=TRUE, encoding="UTF-8",header=TRUE,stringsAsFactors=FALSE)
train <- dataset[1:50,-c(1,3595:3600)]
test <- dataset[1:50,"P"]
library(caret)
tc <- trainControl("repeatedcv", 3, 1)
library(doMC)
registerDoMC(2)                                                                                                                                     
res <- train(train, test, "svmLinear", trControl=tc )  
res <- train(train, test, "svmLinear", trControl=tc )   \#  never ending command

sessionInfo() output:

> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=ru_RU.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=ru_RU.UTF-8        LC_COLLATE=ru_RU.UTF-8    
 [5] LC_MONETARY=ru_RU.UTF-8    LC_MESSAGES=ru_RU.UTF-8   
 [7] LC_PAPER=ru_RU.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=ru_RU.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] kernlab_0.9-19  doMC_1.3.3      iterators_1.0.7 foreach_1.4.2  
[5] caret_6.0-35    ggplot2_1.0.0   lattice_0.20-29

loaded via a namespace (and not attached):
 [1] BradleyTerry2_1.0-5 brglm_0.5-9         car_2.0-21         
 [4] codetools_0.2-8     colorspace_1.2-4    compiler_3.1.1     
 [7] digest_0.6.4        grid_3.1.1          gtable_0.1.2       
[10] gtools_3.4.1        lme4_1.1-7          MASS_7.3-33        
[13] Matrix_1.1-4        minqa_1.2.3         munsell_0.4.2      
[16] nlme_3.1-117        nloptr_1.0.4        nnet_7.3-8         
[19] plyr_1.8.1          proto_0.3-10        Rcpp_0.11.3        
[22] reshape2_1.4        scales_0.2.4        splines_3.1.1      
[25] stringr_0.6.2       tools_3.1.1        

Objects train and test saved with dput( default params) could be found here: http://dfiles.ru/files/149qqz5cz

topepo commented 10 years ago

Thanks for the code and data. I could not repeat the issue on my os:

> library(caret)
Loading required package: lattice
Loading required package: ggplot2
> tc <- trainControl("repeatedcv", 3, 1, verboseIter= TRUE)
> library(doMC)
Loading required package: foreach
foreach: simple, scalable parallel programming from Revolution Analytics
Use Revolution R for scalability, fault tolerance and more.
http://www.revolutionanalytics.com
Loading required package: iterators
Loading required package: parallel
> registerDoMC(2)
> 
> res <- train(train, test, "svmLinear", trControl=tc )
Loading required package: kernlab
+ Fold1.Rep1: C=1 
+ Fold2.Rep1: C=1 
- Fold2.Rep1: C=1 
- Fold1.Rep1: C=1 
+ Fold3.Rep1: C=1 
- Fold3.Rep1: C=1 
Aggregating results
Fitting final model on full training set
> res
Support Vector Machines with Linear Kernel 
  50 samples
3593 predictors
No pre-processing
Resampling: Cross-Validated (3 fold, repeated 1 times) 
Summary of sample sizes: 33, 34, 33 
Resampling results
  RMSE  Rsquared  RMSE SD  Rsquared SD
  1.12  0.311     0.495    0.217      
Tuning parameter 'C' was held constant at a value of 1
> sessionInfo()
R Under development (unstable) (2014-10-02 r66711)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_US.ISO8859-1/en_US.ISO8859-1/en_US.ISO8859-1/C/en_US.ISO8859-1/en_US.ISO8859-1
attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     
other attached packages:
[1] kernlab_0.9-19  doMC_1.3.3      iterators_1.0.7 foreach_1.4.2  
[5] caret_6.0-35    ggplot2_0.9.3.1 lattice_0.20-29
loaded via a namespace (and not attached):
 [1] BradleyTerry2_1.0-5 brglm_0.5-9         car_2.0-21         
 [4] codetools_0.2-9     colorspace_1.2-4    compiler_3.2.0     
 [7] digest_0.6.4        grid_3.2.0          gtable_0.1.2       
[10] gtools_3.4.1        lme4_1.1-7          MASS_7.3-35        
[13] Matrix_1.1-4        minqa_1.2.3         munsell_0.4.2      
[16] nlme_3.1-117        nloptr_1.0.4        nnet_7.3-8         
[19] plyr_1.8.1          proto_0.3-10        Rcpp_0.11.2        
[22] reshape2_1.4        scales_0.2.4        splines_3.2.0      
[25] stringr_0.6.2  

Can you run it outside of RStudio with the option verboseIter and see if it hangs on a particular model iteration? (RS will prevent the logs from showing up in the console)

trufanov-nok commented 10 years ago

It seems that this problem is Revolution R specific. I have uninstalled it and the code works well in clean R. Then I've installed current RRO with dpkg -i RRO-8.0-Beta-Ubuntu-14.04.x86_64.deb and launched the code. It got stuck. The output with verboseIter=T is:

> res <- train(train, test, "svmLinear", trControl=tc ) 
Loading required package: kernlab
+ Fold1.Rep1: C=1 
+ Fold2.Rep1: C=1 
- Fold1.Rep1: C=1 
+ Fold3.Rep1: C=1 
- Fold2.Rep1: C=1 
- Fold3.Rep1: C=1 
Aggregating results
Fitting final model on full training set
> res <- train(train, test, "svmLinear", trControl=tc ) 
+ Fold1.Rep1: C=1 
+ Fold2.Rep1: C=1 
topepo commented 9 years ago

Seems like a Revo issue