Closed cwognum closed 1 year ago
There's a bug in the CombinedCriterion for classification tasks.
CombinedCriterion
For classification tasks, we use the Brier score metric and the AUROC metric. The combined criterion copies the mode of the performance metric, and thus will have to be maximized. Therefore, the calibration and performance score are multiplied. If the Brier score is low (which is good!) this thus leads to a low criterion score (which is bad).
Since we will have to rerun some of the results anyways, we are also considering changing the combined metric altogether. The division / multiplication aggregation might not be the most straight-forward to explain in the paper.
This is fixed in https://github.com/cwognum/mood-experiments/commit/43a902b43db741b0c78a97188843233661a33adb. We ended up switching to Optuna MPO.
There's a bug in the
CombinedCriterion
for classification tasks.For classification tasks, we use the Brier score metric and the AUROC metric. The combined criterion copies the mode of the performance metric, and thus will have to be maximized. Therefore, the calibration and performance score are multiplied. If the Brier score is low (which is good!) this thus leads to a low criterion score (which is bad).
Since we will have to rerun some of the results anyways, we are also considering changing the combined metric altogether. The division / multiplication aggregation might not be the most straight-forward to explain in the paper.