twitter / cassovary

Cassovary is a simple big graph processing library for the JVM
http://twitter.com/cassovary
Apache License 2.0
1.05k stars 150 forks source link

Benchmark current performance on public datasets #37

Closed pankajgupta closed 10 years ago

pankajgupta commented 11 years ago

Use datasets on http://snap.stanford.edu/data/index.html and benchmark performance of a couple of algorithms for very big graphs, such as (1) Global pagerank, (2) Personalized pagerank for every node in the graph.

szymonm commented 10 years ago

Hi Pankaj, this issue is quite old, is it still available?

I see your global pagerank implementation in algorithms package. I guess I should start with coding personalized pagerank. Am I correct, that the only difference between personalized and the global one is the non-uniform probabilities vector for jumps? In your implementation it means a an array of dampingAmount instead of one value. This should be passed as a parameter to the algorithm. Am I right?

So shouldn't I generalize the global one? For example by adding optional parameter Function1[Int, Double](default _ => 1) to the PageRankParams?

pankajgupta commented 10 years ago

Yes, this is still valid. Personalized pagerank already exists. See method calculatePersonalizedReputation(…) in https://github.com/twitter/cassovary/blob/master/src/main/scala/com/twitter/cassovary/graph/GraphUtils.scala#L136

szymonm commented 10 years ago

This should be closed.