The original strategy for cross validation was to fit the CV folds in parallel, and then start on the final model on a single thread once all the folds are finished. If there are cores and memory to spare, there is a significant speedup from fitting the final model in parallel along with the CV folds.
The folds are only used to inform the optimal tree depth: no information is passed between the CV fold models and the final model. Nothing is lost from making the whole thing parallel.
Supplying n.cores = cv.folds + 1 shows a huge speedup, about 30 to 40% (making the whole thing about as fast as not using CV at all). When n.cores <= cv.folds, there is a more modest speedup (the work of fitting the final model can be taken up earlier by any idle thread).
The original strategy for cross validation was to fit the CV folds in parallel, and then start on the final model on a single thread once all the folds are finished. If there are cores and memory to spare, there is a significant speedup from fitting the final model in parallel along with the CV folds.
The folds are only used to inform the optimal tree depth: no information is passed between the CV fold models and the final model. Nothing is lost from making the whole thing parallel.
Supplying n.cores = cv.folds + 1 shows a huge speedup, about 30 to 40% (making the whole thing about as fast as not using CV at all). When n.cores <= cv.folds, there is a more modest speedup (the work of fitting the final model can be taken up earlier by any idle thread).