Open corybrunson opened 3 weeks ago
These should all be in yardstick. I've made an issue for ranked probably scores, which I favor.
I've read the Sakai paper(s), and they seem to think that probabilistic predictions do not exist.
TBH, everything else that I've seen is problematic in a variety of ways. MSE/MAE/RMSE based on predicted class "distances" are things that we can estimate, but I would not want to rely on them. If we use a class-based metric, I would choose Kappa or alpha or one of the others that have been studied and vetted for decades.
A lot of the metrics I see in the CS papers seem poorly motivated, and I get the sense that they've never looked into the massive amounts of prior art on the subject.
Recently, Sakai (2021) compared several class, numeric, and proposed "ordinal" performance measures/metrics on ordinal classification tasks. This raises the questions of (1) what performance measures {yardstick} should make available for ordinal classification models and (2) how to harmonize this decision with package conventions. I don't know what challenges (2) would pose, and anyway they will depend on (1).
I think it's necessary to make measures available that are specifically designed for ordinal classification, in part because there are serious, though separate, theoretical problems with using class and numeric measures. That said, i think there are also good reasons to make both class and numeric measures available:
Because
metric_set()
(understandably) refuses to mix numeric and class measures, perhaps this would be best achieved by allowingordinal_reg()
and (its and other) ordinal engines to also play in'regression'
mode, while the specifically ordinal measures could require (else error) or expect (else warning) that the outcome isordered
, that the model type or engine is ordinal, or that some other check is passed.This would unavoidably enable bad practice, but it's bound to come up, and i think it deserves consideration.