Open denmoroz opened 2 years ago
I think this already supported as the name "AUC" but tagging in @imatiach-msft for exact details
Hmm, getAUC
calls areaUnderROC
not areaUnderPR
, so it should be ROC AUC not PR AUC.
Also it not clear what the difference between areaUnderROC (MetricConstants.AreaUnderROCMetric
) and AUC (MetricConstants.AucSparkMetric
) as getAUC internally calls areaUnderROC from spark.mllib.evaluation.BinaryClassificationMetrics
.
These two are just synonyms for ROC AUC?
"Also it not clear what the difference between areaUnderROC (MetricConstants.AreaUnderROCMetric) and AUC (MetricConstants.AucSparkMetric) as getAUC internally calls areaUnderROC from spark.mllib.evaluation.BinaryClassificationMetrics."
Yes, indeed, they are synonymous.
"Hmm, getAUC calls areaUnderROC not areaUnderPR, so it should be ROC AUC not PR AUC."
Perhaps we can rename this. Honestly "PR AUC" is used a lot less often than ROC AUC as I've personally seen. How would you prefer us to rename these? They were determined several years ago for reasons that are no longer relevant (similarity to another Microsoft ML platform's metric names).
Usually when I see AUC (Area Under Curve) I assume it's for ROC (Receiver Operating Characteristic) already, PR AUC is used less often.
Perhaps we can rename this
There is a common agreement in community that AUC = ROC AUC (areaUnderROC in spark terms), so probably no need to rename anything. Instead it will be nice to add PR AUC (areaUnderPR in spark terms) and name it as AP (average precision) for instance (naming is not my best 😓 ). At least it then will follow LightGBM metrics naming:
Usually when I see AUC (Area Under Curve) I assume it's for ROC (Receiver Operating Characteristic) already, PR AUC is used less often.
Exactly!
Honestly "PR AUC" is used a lot less often than ROC AUC as I've personally seen.
Indeed, but it depends on task you solve. ROC is a balance between TPR and FPR while PR is Precision - Recall balance. It may help with highly-imbalanced datasets. You might have ROC AUC close to 1.0 but with practically zero recall at the same time. Whereas ROC PR is much more useful for such tasks.
@imatiach-msft i think area under PR is much better for unbalanced tasks
@denmoroz -- if you're satisfied with the response, can you please close the issue ?
@ppruthi 👋 Sorry, it is still unclear to me from the above conversation whether this feature will be implemented or it will not. I can surely close it if it is not in plans anytime soon.
Is your feature request related to a problem? Please describe. There is ROC AUC in
ComputeModelStatistics
, but at the same time Average Precision (areaUnderPR
) is absent.Describe the solution you'd like It will be awesome to add it as it is very useful for many binary classification tasks.
Additional context None.
AB#1789611