topepo / caret

caret (Classification And Regression Training) R package that contains misc functions for training and plotting classification and regression models
http://topepo.github.io/caret/index.html
1.61k stars 634 forks source link

Calculation of Optimism-Corrected Performance #1326

Open bpuladi opened 1 year ago

bpuladi commented 1 year ago

I am currently working on the topic "Calculation of Optimism-Corrected Performance". Caret provides this function via trainControl(method="optimism_boot"). I took the liberty to have a look at the code. In doing so, I noticed the following here:

caret/R/workflows.R line 316 final_estimate <- out[ , paste0(perfName, "Apparent")] + optimism

That should be: final_estimate <- out[ , paste0(perfName, "Apparent")] - optimism

Excerpt from the book "Clinical Prediction Models" by Ewout Steyerberg, page 107, chapter 5.3.4 "Calculation of Optimism-Corrected Performance".

ptimism-corrected performance is calculated as Optimism-corrected performance = Apparent performance in original sample - Optimism; where Optimism = Bootstrap performance - Test performance:

Or?

bpuladi commented 1 year ago

I have tested the code additionally, but it seems to be valid with plus. Therefore a thinking error. The original code in workflows would have to be the same in content (boot and orig swapped) and made from + -: optimism <- out[ , paste0(perfName, "Boot")] - out[ , paste0(perfName, "Orig")] final_estimate <- out[ , paste0(perfName, "Apparent")] - optimism

Since it can influence the results in studies, I ask to check this again against! Thanks a lot!

bpuladi commented 1 year ago

I think there is still a problem: https://r-craft.org/r-news/part-2-optimism-corrected-bootstrapping-is-definitely-bias-further-evidence/