Closed fraimondo closed 1 month ago
Attention: Patch coverage is 95.34884%
with 2 lines
in your changes missing coverage. Please review.
Project coverage is 89.89%. Comparing base (
eb7207f
) to head (cfb4936
). Report is 6 commits behind head on main.
Files with missing lines | Patch % | Lines |
---|---|---|
julearn/utils/_cv.py | 60.00% | 1 Missing and 1 partial :warning: |
PR Preview Action v1.4.8 :---: Preview removed because the pull request was closed. 2024-09-26 13:51 UTC
So far, when the user requested the final model, after calling scikit-learn's
cross_validate
, julearn was fiting the model again, on the full training data.The main issue is when using joblib to parallelize, there was a call for each outer CV fold and once it was done, the main process will fit the final model. With enough resources, this is suboptimal, as one might want to fit the final model at the same time of the individual folds.
This PR changes the internal logic so the effect is the same, but the fiting happens at a different time. The idea is to add an "extra" fold in the CV object which includes the whole dataset. After the call to
cross_validate
is done, we remove the last entry and use this as the final model, obtaining the same output, but allowing the user to use joblib to parallelise together across CV folds and the final model.