tidymodels / tune

Tools for tidy parameter tuning
https://tune.tidymodels.org
Other
280 stars 42 forks source link

To collect metrics for the analysis datasets with fit_resamples as well: following AI guidelines #842

Closed AlbertoImg closed 8 months ago

AlbertoImg commented 8 months ago

Hi tidymodels team,

Following up closed issue #215 (How to collect the metrics for the analysis datasets?) and opening up the discussion again, I would say that it is worth implementing the collection of the metrics when using analysis datasets (training) in the fit_resamples. Reasons: 1- It helps to investigate/spot issues with overfitting. Currently we can get metrics from fit_resamples that may look like good enough, but with the potential blinded issue of overfitting. Looking at the metrics in the training set as well would tell us if the model have already overfitted during training, and thus, a review of the workflow should be considered.
2- There are AI guidelines that suggest reporting performance in training and testing steps for more workflow transparency (e.g. Collins, Gary S., et al. "Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) the TRIPOD statement." Circulation 131.2 (2015): 211-219, https://doi.org/10.1186/s12916-014-0241-z)

Of course, the implementation should clearly differentiate between the training fold and validation fold for each metrics, to avoid "optimistic" performance reporting (picking up the training metrics).

Best Alberto

simonpcouch commented 8 months ago

Thanks for the issue.

Currently we can get metrics from fit_resamples that may look like good enough, but with the potential blinded issue of overfitting.

In the case of an overfit model, the metrics for the assessment set will be poor. In other words, models that do not overfit will have stronger assessment set metrics than models that overfit.

The big picture in Collins et al seems to be that reports on model development ought to "include some form of internal validation to quantify any optimism in the predictive performance" that would result from assessing models using data they were trained with. Resampling seems in scope here.

github-actions[bot] commented 8 months ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.