valence-labs / mood-experiments

Molecular Out-Of-Distribution
35 stars 3 forks source link

Bug in CombinedCriterion: Some results need to be repeated. #1

Closed cwognum closed 1 year ago

cwognum commented 1 year ago

There's a bug in the CombinedCriterion for classification tasks.

For classification tasks, we use the Brier score metric and the AUROC metric. The combined criterion copies the mode of the performance metric, and thus will have to be maximized. Therefore, the calibration and performance score are multiplied. If the Brier score is low (which is good!) this thus leads to a low criterion score (which is bad).

Since we will have to rerun some of the results anyways, we are also considering changing the combined metric altogether. The division / multiplication aggregation might not be the most straight-forward to explain in the paper.

cwognum commented 1 year ago

This is fixed in https://github.com/cwognum/mood-experiments/commit/43a902b43db741b0c78a97188843233661a33adb. We ended up switching to Optuna MPO.