Add in-training tooling to find a more optimal threshold for binary classification.

justinxzhao commented 2 years ago

Ludwig uses a default threshold of 0.5 to calculate accuracy for binary classification problems. However, it's highly possible, especially for imbalanced datasets that a threshold of 0.5 is not the best threshold to use.

The AUC measures the performance of a binary classifier averaged across all possible decision thresholds, and is commonly used to determine a better threshold that gets a better balance of precision and recall.

One such algorithmic outline, proposed by @geoffreyangus and @w4nderlust:

def find_best_threshold(model, output_feature_name, dataset, metric, thresholds:  range(0, 1, 0.05)):
  probabilities = model.predict(dataset)[output_feature_name]['probabilities']
  scores = []
  for threshold in thresholds:
    preds = probabilities[:, 1] > threshold
    metric_score = metric(preds, targets)  # TODO: extract targets from `dataset`
    scores.append(metric_score)
  return threshold[argmax(scores)]

By default, the optimal threshold should be calculated at the end of the training phase.

It would also be useful to expose this as a standalone API.

amholler commented 2 years ago

An example that works on the current code is here: https://github.com/ludwig-ai/experiments/blob/main/automl/heuristics/santander_customer_satisfaction/eval_util.py with an example invocation here: https://github.com/ludwig-ai/experiments/blob/main/automl/heuristics/santander_customer_satisfaction/train_tabnet_imbalance_ros.py

justinxzhao commented 2 years ago

Largely a duplicate of #2158

ludwig-ai / ludwig

Add in-training tooling to find a more optimal threshold for binary classification. #2181