speeding up cross-validation

Hi, I am wondering if there is an option to speed up the cross-validation. Although I have access to a HPC this still takes very long, which is (as I understand) due to the fact that each chain is bound to one core.

Is it possible to split the cross-validation? For a 4 fold cross-validation I have tried to replace fold 2, 3, and 4 within the createPartition object with NA and hoped that the computePredictedValues function would only estimate fold 1. But this was not accepted:

Error in matrix(NA, sum(train), hM$nr) : 
  invalid 'nrow' value (too large or NA)

If this would work I could to this for each fold separately (with separate jobs on the HPC) and combine the cross-validate measures of fit afterwards. I guess one thing that would work is to replace with another number, hence I have 2 uneven folds; would do that for each fold; use the 4 smaller folds to summarize to a 4 fold validation. Best Jörg

There is a work in progress with more aggressive parallelization: k chains and n folds can be run in k × n parallel processes (if your hardware allows). This is not yet implemented for species cross-validation which becomes slow with mcmcStep. It is an easy-ish task to extend this to species cross-validation, but needs some thinking to choose between two alternative ways of making this.

The experimental version is in separate branch parallel-CV-2 (number one never was public). This implements parallel processing in alternative function pcomputePredictedValues, and the old untouched version is still available and will not vanish if you try the new one (and also allows comparison of results). Install this with devtools::install_github("hmsc-r/HMSC", ref="parallel-CV-2").

There are caveats:

The new parallelization does not (yet) work on Windows, but you need to have a unix-based (Posix) system, such as macOS or Linux (development was on macOS, tested also on Linux). The Windows parallelization will come, but it is the last stage of the development. This is not ideological but practical: implementation of parallel processing in Windows is so much more complicated that I want to have a more finished and stable function before I even try to translate it to Windows.
The code has not been tested with large data sets, and parallelization will put greater demands on physical memory, and if these limits are exceeded, the function can stall and run very slowly. Experience and comments will be appreciated. (Probably the Windows implementation will be even more memory hungry, but we'll see if we live.)
Random number sequences will be different, and even with the same RNG seed the results are not reproducible between current and new functions. If you compare the results, you must be satisfied if the results look similar.
If you have many CPUs, do not allocate all of them to cross-validation: your computer needs at least one for its regular background work, and it will take it from your parallel process when needed and this will mean a great slow-down. In my Mac mini M1 (ARM processor), using 5/8 cores was optimal and 8/8 took almost twice the time for three chains times three folds or nine threads (but that is partly due to other peculiarities of the hardware). Also if you have parallelized BLAS, the advantages of parallel CV can be very small (seen on a 40-core Linux with parallelized openBLAS).

There are other tricks that we can try, but these need more experimentation. One problem is that we really do re-sample the original model in its full scale for each fold, and this takes about the same time as the original sampleMcmc – and for each fold. It can be that there are some shortcuts to make this quicker, but this is something we need to discuss among developers.

hmsc-r / HMSC

speeding up cross-validation #132