tidymodels / yardstick

Tidy methods for measuring model performance
https://yardstick.tidymodels.org/
Other
367 stars 54 forks source link

yardstick and caret give different area under PRC. Which is right? #449

Closed jarbet closed 10 months ago

jarbet commented 10 months ago

The problem

Yardstick and caret give different estimates of area under precision recall curve for multi-class problems. In some of my work, I'm seeing substantial differences > 0.3 between the 2 packages. Any thoughts as to why they differ or which method should be preferred?

Reproducible example

suppressPackageStartupMessages(library(yardstick));
suppressPackageStartupMessages(library(caret));

data(hpc_cv);
hpc_cv <- hpc_cv[hpc_cv$Resample == 'Fold04',];
yardstick::pr_auc(hpc_cv, obs, VF, `F`, M, L, estimator = 'macro');
#> # A tibble: 1 × 3
#>   .metric .estimator .estimate
#>   <chr>   <chr>          <dbl>
#> 1 pr_auc  macro          0.680
caret::multiClassSummary(data = hpc_cv, lev = c('VF', 'F', 'M', 'L'))['prAUC']
#>     prAUC 
#> 0.6585514

Created on 2023-10-10 by the reprex package (v2.0.1)

EmilHvitfeldt commented 10 months ago

Hello @jarbet! 👋

I spend some time looking into the differences. {caret} uses {MLmetrics} to do the calculations, where as {yardstick} does the calculation itself. they agree mostly with each other, although {MLmetrics} calculates the precision to be NaN when the threshold is Inf, and {yardstick} sets that value to 1. This by itself can lead to slight differences.

Other than that, {yardstick} and {MLmetrics} agrees.

{caret} does a little extra as seen here: https://github.com/topepo/caret/blob/5f4bd2069bf486ae92240979f9d65b5c138ca8d4/pkg/caret/R/postResample.R#L231-L232, but calculation calculating a custom one-vs-all ROC curve for each class and taking the mean of the resulting metrics. Which again leads to different results.

I would recommend that you use {yardstick}, which is also tested against other libraries like scikit-learn https://github.com/tidymodels/yardstick/blob/main/tests/testthat/test-prob-pr_auc.R#L30-L57 for results.

jarbet commented 10 months ago

https://github.com/tidymodels/yardstick/blob/main/tests/testthat/test-prob-pr_auc.R#L30-L57

Thanks for explaining! I think this solves my issue.

github-actions[bot] commented 10 months ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.