tobigithub / caret-machine-learning

Practical examples for the R caret machine learning package
MIT License
67 stars 50 forks source link

memory bug in parallel GFS.THRIFT via caret #24

Open tobigithub opened 8 years ago

tobigithub commented 8 years ago

GFS.THRIFT from frbs_3.1-0 under caret will use excessive memory (10 Gbyte per rscript.exe) for a small example (10x10 matrix with 3 kByte) in parallel mode (cores>4). The example will either run or crash the RGUI. Sequential use maybe fine but slow. Package source: http://dicits.ugr.es/software/FRBS/index.php

# load caret and DT the cars data set
require(caret); require(DT);  require(mlbench);

set.seed(123)
simReg <- as.data.frame(mlbench.friedman1(10, sd = 1))
featurePlot(x=simReg[1:10], y=simReg$y)

trainIndex <- createDataPartition(y=simReg$y, p=0.7, list=FALSE, times = 1)
training_data <- simReg[trainIndex,]
testing_data <- simReg[-trainIndex,]

# all the training data (just named x and y)
y <- training_data$y
x <- training_data[, -ncol(training_data)]

# load all libraries
library(doParallel); cl <- makeCluster(8); registerDoParallel(cl)

train(x,y,"GFS.THRIFT")

# stop the parallel processing and register sequential front-end
stopCluster(cl); registerDoSEQ();