rafguns / linkpred

Easy link prediction tool
Other
140 stars 46 forks source link

Simplify evaluation to plain functions #3

Open rafguns opened 9 years ago

rafguns commented 9 years ago

sklearn.metrics is in a way much simpler, using plain fuctions. Can we do something analogous or even depend on scikit-learn for stuff like ROC, recall-precision etc.?

rafguns commented 4 years ago

Looking at this again. The main issue is that linkpred uses its own data structure (Scoresheet) to track prediction scores. This has at least two advantages:

  1. Order of nodes is never a problem: (a,b) == (b,a) and ranking of pairs with the same scores is deterministic
  2. Only node pairs for which there is a prediction need to be tracked, which is less memory-intensive. This is especially a concern for larger networks. E.g. 5000 nodes yield 12497500 node pairs.

Especially 2 is fundamentally different from scikit-learn.

The way forward is probably to replace Scoresheet with a Pandas Series, whose keys are all node pairs and whose values are scores. The index could be built prior to evaluation:

idx = pd.MultiIndex.from_tuples(itertools.combinations(G.nodes(), 2))

and shared across evaluations. The underlying numpy array could then be passed to scikit-learn metrics.

I am not yet sure how best to deal with 1 and/or to what extent it constitutes a problem.