Open davidlpgomes opened 2 months ago
Hey, great contribution! 😄 Could you provide some benchmarks to illustrate how much the rolling AUC calculation has sped up?
Hey @AdilZouitine, thanks! My team and I are the writers of the paper mentioned.
In the paper, we ran several experiments with various stream datasets, comparing our prequential algorithm with the batch version (in addition to scikit-learn's batch implementation). On average, our algorithm proved to be 13 times faster, using 12 times less energy, compared to the batch algorithm (using a window of size 1000).
I will implement a simple stream experiment comparing the time spent to calculate the AUC-PR using our prequential algorithm and the batch version. I'll send the link to the repository when I'm done 😃
Hey, @AdilZouitine, the benchmarks (code and results) comparing the Rolling AUC-PR and the Batch AUC-PR are presented on my benchmark-aucpr repository.
The Rolling algorithm is the same as the contribution, with some unused functions removed. The Batch AUC-PR function has a similar algorithm, but does not store a window of samples, instead, receives the scores and y_true as parameters.
In the benchmarks, they are used directly in C++, i.e., without Cython/Python.
A C++ implementation of the Prequential/Rolling AUC-PR, it uses Cython to compile the code.
It uses a sliding window of size S, calculating the precise (i.e., not an approximation) AUC-PR with the last S seen instances.
Based on Gomes, Grégio, Alves, and Almeida, 2023.