topepo / FES

Code and Resources for "Feature Engineering and Selection: A Practical Approach for Predictive Models" by Kuhn and Johnson
https://bookdown.org/max/FES
GNU General Public License v2.0
716 stars 237 forks source link

Leave-one-out cross validation is very valuable #107

Open TimothyMasters opened 2 years ago

TimothyMasters commented 2 years ago

Near the end of Section 3.4.1 the statement is made that leave-one-out cross validation is deprecated. Because this version is the most compute-intensive of all (training is done most often, and training-set sizes are largest, an expensive combination), and many modern model-fitting applications are themselves compute-intensive, it may occasionally be the case that leave-one-out is impractical due to limitations on computing time. However, when one is not faced with such limits, leave-one-out is usually the best possible choice for cross validation. This is because stability of models is nearly always dependent on training-set size. As an extreme example, even simple linear regression is undefined when there are fewer cases than coefficients. On a more practical note, neural networks can be wildly unstable with small training sets. Thus, in order to minimize the variance due to model variation, not to mention prevent completely illegal training environments, it is nearly always in our best interest to make each fold's parameter-learning set as large as possible, which is obtained with leave-one-out cross validation.

topepo commented 2 years ago

I really don't want to come off as argumentative (I truly mean that). I do think that we have completely opposite opinions on a few of these topics.

I would agree with your assessment of LOO in the situation where the data are so pathologically small (say < 10-20 data points) and possibly with incredibly severe class imbalances. At any point above my subjective threshold, I would use many bootstraps or Monte-Carlo resamples (to drive down the variance).

Also, I am dismissive of using any resampling in those small data situations for anything other than linear models. Talking about a neural network under those conditions implies that the data support that sort of low-bias model. Sure you can do it, but it's an incredibly bad idea to begin with.

crossxwill commented 2 years ago

LOOCV is computationally prohibitive (which makes it unpopular). However, it has the least amount of bias and the lowest variance among the alternatives (e.g., 3-fold, 5-fold, and 10-fold): https://github.com/crossxwill/K_fold_cv/blob/main/k_fold_cv.pdf (scroll to the bottom)