online-ml / river

🌊 Online machine learning in Python
https://riverml.xyz
BSD 3-Clause "New" or "Revised" License
4.89k stars 538 forks source link

Efficient Rolling AUC-PR implementation #1543

Open davidlpgomes opened 2 months ago

davidlpgomes commented 2 months ago

A C++ implementation of the Prequential/Rolling AUC-PR, it uses Cython to compile the code.

It uses a sliding window of size S, calculating the precise (i.e., not an approximation) AUC-PR with the last S seen instances.

Based on Gomes, Grégio, Alves, and Almeida, 2023.

AdilZouitine commented 2 months ago

Hey, great contribution! 😄 Could you provide some benchmarks to illustrate how much the rolling AUC calculation has sped up?

davidlpgomes commented 2 months ago

Hey @AdilZouitine, thanks! My team and I are the writers of the paper mentioned.

In the paper, we ran several experiments with various stream datasets, comparing our prequential algorithm with the batch version (in addition to scikit-learn's batch implementation). On average, our algorithm proved to be 13 times faster, using 12 times less energy, compared to the batch algorithm (using a window of size 1000).

I will implement a simple stream experiment comparing the time spent to calculate the AUC-PR using our prequential algorithm and the batch version. I'll send the link to the repository when I'm done 😃

davidlpgomes commented 2 months ago

Hey, @AdilZouitine, the benchmarks (code and results) comparing the Rolling AUC-PR and the Batch AUC-PR are presented on my benchmark-aucpr repository.

The Rolling algorithm is the same as the contribution, with some unused functions removed. The Batch AUC-PR function has a similar algorithm, but does not store a window of samples, instead, receives the scores and y_true as parameters.

In the benchmarks, they are used directly in C++, i.e., without Cython/Python.