Closed Juanhui28 closed 10 months ago
Thanks for raising this issue. I think for Hits@K, the "pessimistic" implementation makes more sense to me. Suppose you predict exactly the same score, let's say alpha
, for all positive and negative edges, i.e., the entries of y_pred_pos
and y_pred_neg
are all alpha
. Then, kth_score_in_negative_edges
would give alpha
. If we were "optimistic" and make this line into
hitsK = float(torch.sum(y_pred_pos >= kth_score_in_negative_edges).cpu()) / len(y_pred_pos)
then, hitsK
would be 1, which is clearly not ideal. Let me know if this makes sense.
Thanks for your reply! Yes, if we rank the positive sample at first when we have a tie, it doesn't make sense. Then how about we combine ranking at last and at first, like the way used for computing the MRR?
optimistic_rank = (y_pred_neg >= y_pred_pos).sum(dim=1)
pessimistic_rank = (y_pred_neg > y_pred_pos).sum(dim=1)
ranking_list = 0.5 * (optimistic_rank + pessimistic_rank) + 1
Seems it's more reasonable?
Hi! I am a bit confused. In these lines, isn't the optimistic/pessimistic the opposite? It looks like the optimistic_rank
is always larger than the pessimistic_rank
. In general, I think we should only consider the pessimistic ranking for the resulting link prediction system to be useful. So the MRR should only consider the pessimistic rank, in my opinion. What do you think?
Actually the codes in my last comment are from the offical ogb code for calculating the MRR: _eval_mrr
function in the Evaluator
class. Yes, I think the names for optimistic/pessimistic are confused. But the MRR considers both rankings in the code.
And I also think combining two is more reasonable, because ranking first/last is unfair. This code ranking_list = 0.5 * (optimistic_rank + pessimistic_rank) + 1
is like we are ranking at middle. So it's like we rank the positive sample randomly and the expectation for the random ranking is ranking at the middle, which makes more sense to me.
The MRR code is from PR, so I am confused (even though I approved it before) :)
For Hits@K, the optimistic one does not make sense to me because we can achieve the score of 1 by predicting the same score. Even after averaging with the pessimistic one, the score would be 0.5. I am leaning towards the pessimistic one because I don't think it is right to predict the same scores for many links to cheat the metric. What if you predict the same scores across all pair of nodes. That would give you a score of 0.5 Hits@K.
Hi, thanks for priving a nice package for ogb datasets! I have concern about the evaluation for computing the hit@K. In your implementation, the positive sample is always ranked at the last. Not sure if it's unfair? But for the MRR, it considers ranking at last and at first, so it's more reasonable to me.
Do we also need to consider ranking at last and at first for the hit@k as well?
Thanks!