Open stephanJG opened 2 years ago
There is a work in progress with more aggressive parallelization: k chains and n folds can be run in k × n parallel processes (if your hardware allows). This is not yet implemented for species cross-validation which becomes slow with mcmcStep
. It is an easy-ish task to extend this to species cross-validation, but needs some thinking to choose between two alternative ways of making this.
The experimental version is in separate branch parallel-CV-2
(number one never was public). This implements parallel processing in alternative function pcomputePredictedValues
, and the old untouched version is still available and will not vanish if you try the new one (and also allows comparison of results). Install this with
devtools::install_github("hmsc-r/HMSC", ref="parallel-CV-2")
.
There are caveats:
There are other tricks that we can try, but these need more experimentation. One problem is that we really do re-sample the original model in its full scale for each fold, and this takes about the same time as the original sampleMcmc
– and for each fold. It can be that there are some shortcuts to make this quicker, but this is something we need to discuss among developers.
Hi, I am wondering if there is an option to speed up the cross-validation. Although I have access to a HPC this still takes very long, which is (as I understand) due to the fact that each chain is bound to one core.
Is it possible to split the cross-validation? For a 4 fold cross-validation I have tried to replace fold 2, 3, and 4 within the createPartition object with NA and hoped that the computePredictedValues function would only estimate fold 1. But this was not accepted:
If this would work I could to this for each fold separately (with separate jobs on the HPC) and combine the cross-validate measures of fit afterwards. I guess one thing that would work is to replace with another number, hence I have 2 uneven folds; would do that for each fold; use the 4 smaller folds to summarize to a 4 fold validation. Best Jörg