skojaku / degree-corrected-link-prediction-benchmark

Link prediction
MIT License
3 stars 0 forks source link

Adding new metrics #75

Open skojaku opened 1 month ago

skojaku commented 1 month ago

Hits and MRR are another widely-used metrics for link prediction. Can we see the same patterns as AUC-ROC?

skojaku commented 1 month ago

First of all, let's look at these metric values as a function of the degree heterogeneity:

Uniform sampling

Hits@100 image

MRR image

Degree-biased sampling

Hits image

MRR image

skojaku commented 1 month ago

Zooming into the figure of Hits@100 and Uniform sampling shows that there is an increasing tendency of Hits@100 as a function of degree heterogeneity, just as we see for the AUC-ROC

Hits@100 + Uniform sampling image

For MRR, we could not see the patterns though the PA is sticked to 1. image

Note that MRR is calculated based on a single top ranked "positive" edge in the ranking of all edges. Because it is based on an extreme value like this, the trend might be obscured.

skojaku commented 4 weeks ago

And as per the correlation to the retrieval task, we see the consistent patterns for Hits@100

image

skojaku commented 4 weeks ago

And MRR is less clear.

image

Having checked the MRR values, I find a large number of the same values (like 1.0, 0.5, etc). This might be the reason why we see less clear rank correlation.

skojaku commented 4 weeks ago

I misunderstood how OGB computed MRR. I will recompute it shortly.

rachithaiyappa commented 4 weeks ago

The HITS@100 plots sort of make sense to me. Although I am still surprised by uniform sampling shows PA so close to 1.

  1. From my understanding, HITS@k is \frac{number of correct predictions in top k precitions}{total size of test set}. So the way we do this is by making each link prediction algorithm generate 100 links, which it thinks are the most likely, and count the number of links, among these top 100, which are actually present in the network. That number is the numerator. The denominator is then the number of positive edges in our test set. Is this correct?
  2. Also, do we always have more than 100 positive edges in our test sets?
skojaku commented 4 weeks ago
  1. I normalized Hits@k by the maximum possible score (top k is all positive) to see how close it is to perfect prediction. This is because the number of positive edges depends on the number of edges in the network, which varies substantially. So to compare the performance across networks of different size, we need some kind of normalization. This thinking led me to this normalization.

  2. Regarding the sampling, I used the sample of positive edges and the same number of negative edges. So if I take 25% of all edges as the positive edges, we have the same number of negative edges. Then, I ranked these edges altogether. Alternatively, we can rank at the level of nodes, i.e., ranking edges pertained to each node and take average across. But due to the lack of time, I could not do that. But surely, it would be useful for us to test the node-level metrics. Something we might want to do in the future.