khliland / pls

The pls R package
36 stars 3 forks source link

line 238 of crossval.R: parSpec is uninitialized #24

Closed UnixJunkie closed 4 years ago

UnixJunkie commented 4 years ago

it should probably be a parameter of the crossval function

the effect I notice is that all cores of my computer are used when I perform cross validation, however I try to limit that

bhmevik commented 4 years ago

The argument quote(parallel::clusterCall(parSpec, library, "pls", character.only = TRUE, warn.conflicts = FALSE)) to lapplyFunc is an unevaluated bit of code (because of the quote()) that is evaluated inside lapplyFunc, where parSpec is defined. Also note that it only gets used if you have manually created a cluster that is not using fork to create the processes (for instance with pls.options(parallel = makeCluster(8, type = "PSOCK")) for 8 processes). If the parallel option is anythig else - for instance NULL or a number, this piece of code is never evaluated. Also, this is only in the crossval() function, which is not how you normally use cross-validation in the pls package. The normal way is to use something like model <- plsr(...., validation = "CV"), which does not use the crossval() function.

By default, the cross validation in the pls package is done in serial, i.e., without parallelisation (the parallel option is NULL by default). Perhaps you have linked your R against a BLAS/LAPACK library that uses parallelisation? If you see that you have one R process using several cpu cores, then that is most likely the reason. R itself is (mostly) single threaded, and the parallelisation in the pls package uses processes, not threads.

UnixJunkie commented 4 years ago

I use this line exactly:

model <- plsr(y ~ x, method = "simpls", data = train_data, validation = "CV", segments = 5)

How can I limit the number of cores used in that case?

bhmevik commented 4 years ago

As long as pls.options()$parallel returns NULL, the cross-validation is done serially, using only one R process. As I wrote, you probably have R linked with a BLAS/LAPACK library that is threaded. How you control the number of threads that library uses, is sort of outside the scope of the pls package. Many threaded libraries look for an environment variable OMP_NUM_THREADS to determine the number of threads, so you could try setting that to 1. There is also an R package https://cran.r-project.org/web/packages/RhpcBLASctl/index.html that should be able to control it from within R itself. You might want to give it a try.