Bug in CombinedCriterion: Some results need to be repeated.

There's a bug in the CombinedCriterion for classification tasks.

For classification tasks, we use the Brier score metric and the AUROC metric. The combined criterion copies the mode of the performance metric, and thus will have to be maximized. Therefore, the calibration and performance score are multiplied. If the Brier score is low (which is good!) this thus leads to a low criterion score (which is bad).

Since we will have to rerun some of the results anyways, we are also considering changing the combined metric altogether. The division / multiplication aggregation might not be the most straight-forward to explain in the paper.

valence-labs / mood-experiments

Bug in CombinedCriterion: Some results need to be repeated. #1