Closed UnixJunkie closed 4 years ago
The argument quote(parallel::clusterCall(parSpec, library, "pls", character.only = TRUE, warn.conflicts = FALSE))
to lapplyFunc
is an unevaluated bit of code (because of the quote()
) that is evaluated inside lapplyFunc
, where parSpec
is defined. Also note that it only gets used if you have manually created a cluster that is not using fork to create the processes (for instance with pls.options(parallel = makeCluster(8, type = "PSOCK"))
for 8 processes). If the parallel
option is anythig else - for instance NULL
or a number, this piece of code is never evaluated. Also, this is only in the crossval()
function, which is not how you normally use cross-validation in the pls package. The normal way is to use something like model <- plsr(...., validation = "CV")
, which does not use the crossval()
function.
By default, the cross validation in the pls package is done in serial, i.e., without parallelisation (the parallel
option is NULL
by default). Perhaps you have linked your R against a BLAS/LAPACK library that uses parallelisation? If you see that you have one R process using several cpu cores, then that is most likely the reason. R itself is (mostly) single threaded, and the parallelisation in the pls package uses processes, not threads.
I use this line exactly:
model <- plsr(y ~ x, method = "simpls", data = train_data, validation = "CV", segments = 5)
How can I limit the number of cores used in that case?
As long as pls.options()$parallel
returns NULL
, the cross-validation is done serially, using only one R process. As I wrote, you probably have R linked with a BLAS/LAPACK library that is threaded. How you control the number of threads that library uses, is sort of outside the scope of the pls package. Many threaded libraries look for an environment variable OMP_NUM_THREADS
to determine the number of threads, so you could try setting that to 1. There is also an R package https://cran.r-project.org/web/packages/RhpcBLASctl/index.html that should be able to control it from within R itself. You might want to give it a try.
it should probably be a parameter of the crossval function
the effect I notice is that all cores of my computer are used when I perform cross validation, however I try to limit that