Open spedygiorgio opened 8 years ago
The main issue with doing this is that it would multiply the number of workers used. For example, if you requested M cores, most of the parallel processing technologies will end up using M2 because of the nested structure of the calls.
This also happens with some models that can be run in parallel (e.g. ranger
). That's generally why I have avoided it.
I have some changes upcoming to preProcess
that might mitigate some of these issues; you can pick subsets of predictors for specific methods (instead of having to do them all).
FYI, the doFuture backend, or actually the future framework, automatically protects against such nested parallelism that otherwise would "blow up", so using doFuture would be safe in this sense. Moreover, users that got access to compute clusters can utilize such nested processing by using an explicit, nested future strategies, e.g.
library("future.batchtools")
plan(list(batchjobs_sge, multiprocess))
It could be useful to add a parallel backed to preprocess... Operations can be parallelized throught columns and it could be helpful in estimation and prediction expecially when the data set has many features.