stan-dev / projpred

Projection predictive variable selection
https://mc-stan.org/projpred/
Other
110 stars 26 forks source link

Efficiency improvement: Avoid unnecessary re-projections in `cv_varsel()`'s final `varsel()` call #385

Closed fweber144 closed 1 year ago

fweber144 commented 1 year ago

This PR avoids an unnecessary final full-data performance evaluation (including costly re-projections if refit_prj = TRUE) in cv_varsel() with validate_search = TRUE or cv_method = "kfold".

Thereby, a missing argument thresh in cv_varsel()'s former varsel() call is also fixed.

The KL divergence (or rather the (simplified) cross-entropy) values along the full-data solution path are now set to NA. Calculating them would indeed require a final full-data performance evaluation (including costly re-projections if refit_prj = TRUE), but these ce values are not used by projpred anyway and they would not be validated for the search (so will in general be over-optimistic, although the bias should be about the same for all submodels, so model selection might still be possible with them). If these ce values are requested by users in the future, we could think about performing such a final full-data performance evaluation conditionally on a new argument by which these ce values can be requested. But calculating these by default harms more than it helps.