Area Under the Minimum (AUM) of False Positives and Negatives

| [[file:tests/testthat][tests]] | [[https://github.com/tdhock/aum/actions][https://github.com/tdhock/aum/workflows/R-CMD-check/badge.svg]] | [[https://github.com/jimhester/covr][coverage]] | [[https://coveralls.io/github/tdhock/aum?branch=master][https://coveralls.io/repos/tdhock/aum/badge.svg?branch=main&service=github]] |

This R package provides an efficient C++ implementation of the [[https://jmlr.org/papers/v24/21-0751.html][AUM]], which can be used as a surrogate loss for optimizing Area Under the ROC Curve (AUC) in supervised binary classification and changepoint detection problems.

** Installation

+begin_src R

install.packages("aum")

OR:

if(!requireNamespace("remotes"))install.packages("remotes") remotes::install_github("tdhock/aum")

+end_src

** Usage

*** Converting binary labels to aum_diffs

The code below creates an =aum_diffs= data table which represents error functions for two labeled examples in binary classification.

+begin_src R

(bin.diffs <- aum::aum_diffs_binary(c(0,1))) example pred fp_diff fn_diff 1: 0 0 1 0 2: 1 0 0 -1

+end_src

The first row above means there is a false positive difference of 1 at a predicted value of 0. This is the error function for each example with a negative label in binary classification (no error if predicted value less than 0, changes up to 1 false positive for larger predicted values).
The second row means there is a false negative difference of -1 at a predicted value of 0. This is the error function for each example with a positive label in binary classification (1 false negative if predicted value less than 0, changes down to no errors for larger predicted values).

*** Computing AUM from aum_diffs table and prediction vector

Next we assume predicted values of 0 for both examples, and then compute Area Under the Minimum (AUM) of False Positives and False Negatives and its directional derivatives.

+begin_src R

aum::aum(bin.diffs, c(0,0)) $aum [1] 0

$derivative_mat [,1] [,2] [1,] 0 1 [2,] -1 0

+end_src

The result above is a named list with two elements.

=aum= is a numeric value giving the AUM for the specified error functions and predicted values.
=derivative_mat= is a matrix of directional derivatives, one row for each example (first column for left directional derivative, second column for right). In the example above we can see that decreasing the first prediction (entry 1,1) and/or increasing the second prediction (entry 2,2) results in no change to AUM. Since the right directional derivative of the first example is positive (entry 1,2), that implies an increased prediction would result in an increased AUM. Similarly the left directional derivative for the second example is negative (entry 2,1), indicating that a decreased prediction would result in an increased AUM.

*** Changepoint detection

See =?aum::aum_diffs_penalty= for documentation about how to compute the AUM for supervised penalty learning in changepoint detection problems.

*** Line search

An exact line search can be computed using time which is log-linear in the number of step sizes, see =?aum::aum_line_search= for a single line search, and =?aum::aum_linear_model_cv= for using the line search in each step of gradient descent when learning a linear model.

** Related Work

[[https://github.com/tdhock/penaltyLearning/blob/master/R/ROChange.R][penaltyLearning::ROChange]] provides an alternative implementation of AUM and its directional derivatives in R data.table code (slower, also includes ROC curve computation).
[[https://cloud.r-project.org/web/packages/aum/vignettes/speed-comparison.html][The speed comparison vignette]] has an alternative implementation of AUM and its directional derivatives in base R code.
[[https://github.com/tdhock/max-generalized-auc][max-generalized-auc]] repo provides code for making the figures in https://arxiv.org/abs/2107.01285 and https://arxiv.org/abs/2410.08635
Blogs about [[https://tdhock.github.io/blog/2022/torch-auto-grad-non-diff/][a pytorch implementation of AUM for binary classification]], and [[https://tdhock.github.io/blog/2024/torch-roc-aum/][computing AUC and AUM in torch]].

tdhock / aum

readme

+begin_src R

OR:

+end_src

+begin_src R

+end_src

+begin_src R

+end_src