Benchmark current performance on public datasets

pankajgupta commented 11 years ago

Use datasets on http://snap.stanford.edu/data/index.html and benchmark performance of a couple of algorithms for very big graphs, such as (1) Global pagerank, (2) Personalized pagerank for every node in the graph.

szymonm commented 10 years ago

Hi Pankaj, this issue is quite old, is it still available?

I see your global pagerank implementation in algorithms package. I guess I should start with coding personalized pagerank. Am I correct, that the only difference between personalized and the global one is the non-uniform probabilities vector for jumps? In your implementation it means a an array of dampingAmount instead of one value. This should be passed as a parameter to the algorithm. Am I right?

So shouldn't I generalize the global one? For example by adding optional parameter Function1[Int, Double](default _ => 1) to the PageRankParams?

pankajgupta commented 10 years ago

Yes, this is still valid. Personalized pagerank already exists. See method calculatePersonalizedReputation(…) in https://github.com/twitter/cassovary/blob/master/src/main/scala/com/twitter/cassovary/graph/GraphUtils.scala#L136

szymonm commented 10 years ago

This should be closed.

twitter / cassovary

Benchmark current performance on public datasets #37