I am not sure if I've misunderstood the current implementation, but as far as I can see, it does not account for situations where there are multiple positive examples in one sample:
Furthermore, according to documentation the input should be Predicted labels, however, we are more interested in the ranking of the positive item in a given sample (MRR-wiki).
The original implementation seems correct if asked for the rankings, not the labels for the prediction. When assuming all items are positive, as in my example:
mrr_score([1, 1, 1], [3, 2, 1])
0.611111111111111
But then, y_true is not a needed input.
If I haven't misunderstood and you agree I would be happy to make a PR with suggested improvements.
Hi,
I was recently using the
mrr_score
implementation (link):I am not sure if I've misunderstood the current implementation, but as far as I can see, it does not account for situations where there are multiple positive examples in one sample:
Furthermore, according to documentation the input should be
Predicted labels
, however, we are more interested in the ranking of the positive item in a given sample (MRR-wiki).My suggestion is
The original implementation seems correct if asked for the rankings, not the labels for the prediction. When assuming all items are positive, as in my example:
But then,
y_true
is not a needed input.If I haven't misunderstood and you agree I would be happy to make a PR with suggested improvements.
I followed the example used in the Medium post: MRR vs MAP vs NDCG: Rank-Aware Evaluation Metrics And When To Use Them (behind paywall).
Thanks for the awesome repo!