Closed felipemello1 closed 3 years ago
Hi! Good point. It's true that hits@20 used here is not the standard "hits@20" used in the KG completion literature. In DDI, the task is more of mining the positive edges in the pool of all possible edges (i.e., pairs of nodes). Hence, we are evaluating the positive edges against the pool of negative edges (without fixing the end node).
Thanks for answering @weihua916 ! Out of 111k edges, 33% of valid+test is positive, 66% is negative. If the model mistakenly scores 20 negative edges as 1, then it means that my hits@20 will be 0, even if my F1 score is 0.95, for example. Did I get it right?
You are welcome! No that's not correct. We are scoring each positive edge against the pool of provided negative edges. The metric is the ratio of positive edges that are ranked within the top 20 among the negative edges.
Ok, I guess my problem then is understanding what the "pool of provided negative edges" is.
In my first comment, I thought that ideally it would be, for each positive triple u,e,v,
a tailored subset of negative edges of type u,e,corrupted
. However, what I understood from your first answer was that this pool will be the same for every positive edge, and will be composed of all 66% of 111k negative edges (~70k edges). In other words, we would compare each positive edge with the same set of 70k negative edges.
Thanks for your patience.
Yes, that's exactly what we did.
Then, I am not sure if I understand why my previous comment was incorrect. Lets say I get a F1 score of 0.95, and predict the whole dataset pretty well.
If in this negative pool of 70k I have 20 wrong positive predictions = 1, then my hits@20 will be 0 for the whole dataset, because no scores can be > 1 and the following term would be zero float(torch.sum(y_pred_pos > kth_score_in_negative_edges).cpu())
I see what you mean. Yes, if there are 20 negative edges on which your model predicts super high scores (higher than any of the positive edges), then the model performance would be 0.
Got it! Thanks @weihua916
Hi, I am having a hard time trying to understand the hits@20 for DDI link prediction. Could someone please help me with it? This is the evaluator code
So I tried to build an example with K=3:
What I understand that the algorithm is doing is to get the top_K highest error. Then, if we had to iterate over each positive we got right, all we would have to do is to see how many positives are larger than the top_K error. Therefore, score = 3 / 5 = 0.6
What I dont get is how this broadcasts to each specific drug. In my mind, if I have a triple
u,e,v
, to generate the corruptions it would always have to beu,e, corrupted_node
, without ever changingu
. But the algorithm completely disregards that, since it evaluates all validation/test set at once. The corrupted triples can ber,e,t,
wherer != u
. If I had to predict 1MM triples and, for a specificnode 'r'
, I got K of them wrong with very high score, thennode 'u'
would be impacted by these triple of typer,e,_
. Is that correct?I was thinking that the right approach would be to create a set of corrupted triples specific to each
node u
, with the formatu,e, corrupted_node
. Something like below. Could someone please explain to me why I am wrong and the current method works?