rafguns / linkpred

Easy link prediction tool
Other
140 stars 46 forks source link

EvaluationSheet : fully understand the universe terme #31

Open TaousDev opened 4 years ago

TaousDev commented 4 years ago

I don't think I understand the "universe" term that is used as params, or how do I choose it in linkpred/evaluation/static/StaticEvaluation() also in EvaluationSheet() , you stated that this param is important to return the accuracy

Also, how do i get the confusion matrix, recall, precision and accuracy?

Concerning the accuracy do I pick the max value, like this : evaluation.accuracy().max() or is this wrong or should i do this : acc = (sum(evaluation.tp + evaluation.tn))/(sum(evaluation.tp + evaluation.tn + evaluation.fp + evaluation.fn)) (also i imported 'division from future')

I want to use sklearn but what's confusiing me is how do I retrieve the y_true and y_pred from a graph sklearn.metrics.confusion_matrix(y_true, y_pred, *, labels=None, sample_weight=None, normalize=None) how do I get these data from the graph to use them in other Machine learning algorithms such as SVM

this is my full code :


`import linkpred
import random
from matplotlib import pyplot as plt

random.seed(100)

# Read network
G = linkpred.read_network('BUP_full.net')

# Create test network
test = G.subgraph(random.sample(G.nodes(), 33))

# Exclude test network from learning phase
training = G.copy()
training.remove_edges_from(test.edges())

simrank = linkpred.predictors.SimRank(training, excluded=training.edges())
simrank_results = simrank.predict(c=0.5)

test_set = set(linkpred.evaluation.Pair(u, v) for u, v in test.edges())
evaluation = linkpred.evaluation.EvaluationSheet(simrank_results, test_set, simrank_results)

plt.plot(evaluation.recall(), evaluation.precision())`

Thank you

rafguns commented 3 years ago

The universe parameter is an iterable (typically a list or set) of all possible links (i.e. all node pairs) in the graph. Because the number of node pairs increases exponentially with the number of nodes, it can also simply be the number of node pairs (an int). So in your example, I think you could use

n = len(training)
universe = n * (n - 1) // 2

With the benefit of hindsight, this was a premature optimization that would probably require some fairly substantial work to get rid of. I'll get back to your other questions soon.