openml / automlbenchmark

OpenML AutoML Benchmarking Framework
https://openml.github.io/automlbenchmark
MIT License
398 stars 132 forks source link

Negate error metrics to ensure that higher is ALWAYS better #268

Closed Innixma closed 3 years ago

Innixma commented 3 years ago

Example of the issue

Currently, depending on the metric a result value of 0.6 compared to 0.4 can be either better (accuracy, auc, r2, etc.) or worse (rmse, log_loss, mae, etc.). When generating aggregated analysis from the results, currently this knowledge has to be hardcoded by the user and is error prone.

I'd like to propose adding a new column, higher_is_better. If the higher the result metric is, the better the score, then higher_is_better should be True or 1. If not, then it should be False or 0.

Another alternative is to report metrics in higher_is_better form always, and flip the sign to align them. This would cause a log_loss of 0.5 to be reported as -0.5. This is the strategy several AutoML systems use such as AutoGluon and MLJAR, although can be confusing when interpreted by humans (but is great for computers).

PGijsbers commented 3 years ago

Another alternative is to report metrics in higher_is_better form always, and flip the sign to align them.

I think I would prefer this. As in #262 (which the screenshot is from) too many columns make the table hard to read, especially if it means each row in the table is converted in two (or more) lines on the terminal. Perhaps we can just prefix the column names with - (e.g. acc, auc, -logloss).

Innixma commented 3 years ago

If it is acceptable for your system, I would also prefer the signs to be flipped as well, as it adds a great deal of consistency to the code and makes sorting much easier since the user doesn't have to both understand and use the higher_is_better column. In terms of -logloss, this is an interesting idea that I haven't thought of before and don't have a strong opinion on.