timlrx / graph-benchmarks

MIT License
79 stars 5 forks source link

Use pagerank_scipy instead of pagerank [networkx]? #9

Open MridulS opened 4 years ago

MridulS commented 4 years ago

In terms of raw performance networkx.pagerank_scipy can be 4-5X faster than networkx.pagerank. For the google.txt file on my local machine.


In [4]: %%timeit
   ...: nx.pagerank(G, alpha=0.85, tol=1e-3, max_iter=10000000)
   ...:
40.1 s ± 5.19 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [5]: %%timeit
   ...: nx.pagerank_scipy(G, alpha=0.85, tol=1e-3, max_iter=10000000)
   ...:
8.89 s ± 48.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)```
timlrx commented 4 years ago

Thanks for sharing your results. There's a pagerank_numpy as well. For this benchmark I wanted to access it against a pure python implementation baseline.

szhorvat commented 1 year ago

Page rank calculations do not make a very good comparison between packages because with the naïve algorithm, i.e. iterating a matrix product, performance will depend on the targeted accuracy. How many iterations do you do before the result is "stable enough"?

You can try this out by setting the tol parameter in NetworkX's implementation.

Some packages use more advanced algorithms that produce exact results and do not require a tolerance.