Update 20-ensemble-models.Rmd

tidymodels / TMwR

Code and content for "Tidy Modeling with R"

https://tmwr.org

Other

579 stars 272 forks source link

Update 20-ensemble-models.Rmd #335

Closed xiaochi-liu closed 1 year ago

xiaochi-liu commented 1 year ago

Not sure whether I understand correctly here. We are using five repeats of 10-fold cross-validation, so shouldn't we have $5 \times 10 = 50$ assessment sets?

juliasilge commented 1 year ago

I'm pretty sure this one is right as published.

One repeat of 10-fold (or 3-fold, etc) CV generates one assessment set prediction for each training set sample:

Notice that each sample from the training set gets a prediction one time, when it is in the assessment set ("estimate performance using...").

If we repeated that 5 times, we would get five assessment set predictions for each observation in the training set.

xiaochi-liu commented 1 year ago

Thank you very much for your kind guidance, Julia!

Is there a different understanding of "the assessment set"? Based on this figure:

My understanding is like this:

When we do 3-fold cross-validation, we get 3 assessment sets. These 3 assessment sets bind together so that each sample from the training set gets a prediction. Thus, if we do 5 repeats 3-fold cross-validation, we will have $5 \times 3 = 15$ assessments.

juliasilge commented 1 year ago

This is exactly right:

When we do 3-fold cross-validation, we get 3 assessment sets. These 3 assessment sets bind together so that each sample from the training set gets a prediction. Thus, if we do 5 repeats 3-fold cross-validation, we will have 5×3=15 assessments.

With 3-fold CV, there are 3 assessment sets and each training set observation is in one of these, so each training set observation gets one prediction, when it is in the assessment set. There are no predictions when something is in the analysis set.

If we repeat that 5 times, there will be 15 assessments sets. Each training set observation will be in 5 assessment sets (one at each repeat) so there will be 5 predictions made for each training set observation.

xiaochi-liu commented 1 year ago

Got it. Now I totally understand. Thank you very much, Julia!

github-actions[bot] commented 1 year ago

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.