Correct the ROC AUC computations and add relevant tests.

potocpav commented 7 years ago

The old ROC AUC computations were wrong (at least) in cases of duplicate y_hat values. I added a test demonstrating said issue and fixed the computations.

The table below summarizes the resulting AUCs for each test. Tests 0,1 and tests 2,3 differ only in data-point order, so they should obviously return the same AUCs. I checked the correctness of the new values on paper.

EDIT: also checked using Python's sklearn.metrics.roc_auc_score

test	old AUC	new AUC
0	`Ok(1)`	`Ok(0.75)`
1	`Ok(0.25)`	`Ok(0.75)`
2	`Ok(0.625)`	`Ok(0.875)`
3	`Ok(1)`	`Ok(0.875)`
4	`Ok(0.16666666)`	`Ok(0.5)`
5	`Ok(NaN)`	`Ok(0.25)`

maciejkula commented 7 years ago

Thanks for the fix!

potocpav commented 7 years ago

I saw your different coding style and tried at first to just correct your code, but then I got lazy and copied-and-pasted the code from my own project... So yeah, is this better?

maciejkula commented 7 years ago

Great, thanks a lot!

maciejkula / rustlearn

Correct the ROC AUC computations and add relevant tests. #35