rafguns / linkpred

Easy link prediction tool
Other
140 stars 46 forks source link

ROC Plot #20

Closed DinosL closed 5 years ago

DinosL commented 5 years ago

I followed the example provided #12 using an edgelist as input and common neighbours as predictor but the roc plot is empty. Maybe i dont pass the correct arguments to ROCPlotter but i couldn't find any example.My code is this


import linkpred
import random
from matplotlib import pyplot as plt
import math

random.seed(100)

# Read network
G = linkpred.read_network('FollowGraph.edgelist')

testSize = math.ceil(len(G.nodes())*0.2)

# Create test network
test = G.subgraph(random.sample(G.nodes(), testSize))

# Exclude test network from learning phase
training = G.copy()
training.remove_edges_from(test.edges())

cn = linkpred.predictors.AdamicAdar(training, excluded=training.edges())
cn_results = cn.predict()

test_set = set(linkpred.evaluation.Pair(u, v) for u, v in test.edges())
evaluation = linkpred.evaluation.EvaluationSheet(cn_results, test_set)

linkpred.evaluation.listeners.ROCPlotter(evaluation)

plt.show()

Any suggestion? Thank you in advance

DinosL commented 5 years ago

Apparently i am using the ROCPlotter function wrong but i cannot find anything helpful anywhere.

rafguns commented 5 years ago

Sorry for replying late.

Short answer is that you cannot really use ROCPlotter this way. It will only 'do' things if it receives signals. Currently these are sent in LinkPred.process_predictions. This has turned out to be a bad choice but it would require quite some effort to rework this.

That being said, a basic ROC plot can quite easily be obtained this way:

plt.plot(evaluation.fallout(), evaluation.recall()

Does that help?

DinosL commented 5 years ago

No. I am using the code above with your suggestion and i get the Measure is undefined if universe is unknown error. My input file is an .edgelist file with the following format

0 1
2 1
3 1
...
rafguns commented 5 years ago

Ah yes, for ROC plots (specifically for 'fallout' or false positive rate, the values on the horizontal axis) we need to know the total number of possibilities, in order to calculate the number of true negatives (non-edges in the test network that are not predicted).

If I understand your setup correctly, you can achieve this by changing the last few lines as follows:

# Determine 'universe', number of all possible edges that could be predicted
n = len(test)
num_universe = n * (n - 1) // 2

test_set = set(linkpred.evaluation.Pair(u, v) for u, v in test.edges())
evaluation = linkpred.evaluation.EvaluationSheet(cn_results, test_set, num_universe)
plt.plot(evaluation.fallout(), evaluation.recall())
plt.show()
DinosL commented 5 years ago

The code you provided does the job. Thank you for your help.