Closed 2460b664-5ecd-43fc-9bbe-d0f333762988 closed 5 years ago
As per commment 9, 12 and 16 I think it will be a good idea to have threshold value of number of nodes in the graph as less than 60 nodes numpy implementation should be used as default and greater than that scipy implementation should be used. If you have another idea on thresholding based on density or any other parameter I would like to know.
why 60? I though scipy was fast even for small graphs...
Due to the following results I suggested 60. For small graphs time taken by numpy is lesser than that of scipy.
sage: G=nx.gnp_random_graph(40,0.01,directed=True)
sage: %timeit nx.pagerank_numpy(G)
1000 loops, best of 3: 883 µs per loop
sage: %timeit nx.pagerank_scipy(G)
1000 loops, best of 3: 1.7 ms per loop
sage: %timeit nx.pagerank(G)
1000 loops, best of 3: 1.7 ms per loop
sage: G=nx.gnp_random_graph(60,0.01,directed=True)
sage: %timeit nx.pagerank(G)
100 loops, best of 3: 2.87 ms per loop
sage: %timeit nx.pagerank_scipy(G)
1000 loops, best of 3: 1.87 ms per loop
sage: %timeit nx.pagerank_numpy(G)
1000 loops, best of 3: 1.58 ms per loop
sage: G=nx.gnp_random_graph(70,0.01,directed=True)
sage: %timeit nx.pagerank(G)
100 loops, best of 3: 3.66 ms per loop
sage: %timeit nx.pagerank_numpy(G)
100 loops, best of 3: 2.22 ms per loop
sage: %timeit nx.pagerank_scipy(G)
100 loops, best of 3: 1.99 ms per loop
sage: G=nx.gnp_random_graph(80,0.01,directed=True)
sage: %timeit nx.pagerank(G)
100 loops, best of 3: 4.66 ms per loop
sage: %timeit nx.pagerank_scipy(G)
1000 loops, best of 3: 1.93 ms per loop
sage: %timeit nx.pagerank_numpy(G)
100 loops, best of 3: 3.3 ms per loop
sage: G=nx.gnp_random_graph(100,0.01,directed=True)
sage: %timeit nx.pagerank(G)
100 loops, best of 3: 7.07 ms per loop
sage: %timeit nx.pagerank_scipy(G)
100 loops, best of 3: 2.29 ms per loop
sage: %timeit nx.pagerank_numpy(G)
100 loops, best of 3: 4.63 ms per loop
sage: G=nx.gnp_random_graph(400,0.01,directed=True)
sage: %timeit nx.pagerank_numpy(G)
10 loops, best of 3: 175 ms per loop
sage: %timeit nx.pagerank_scipy(G)
100 loops, best of 3: 3.9 ms per loop
sage: %timeit nx.pagerank(G)
10 loops, best of 3: 53.2 ms per loop
sage: G=nx.gnp_random_graph(4000,0.01,directed=True)
sage: %timeit nx.pagerank(G)
1 loop, best of 3: 1.81 s per loop
sage: %timeit nx.pagerank_scipy(G)
1 loop, best of 3: 206 ms per loop
sage: %timeit nx.pagerank_numpy(G)
1 loop, best of 3: 1min 1s per loop
sage: G=nx.gnp_random_graph(40,0.5,directed=True)
sage: **%timeit nx.pagerank_numpy(G)
The slowest run took 86.64 times longer than the fastest. This could mean that an intermediate result is being cached.
1 loop, best of 3: 2.32 ms per loop**
sage: %timeit nx.pagerank_scipy(G)
The slowest run took 60.81 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 2.41 ms per loop
sage: %timeit nx.pagerank(G)
100 loops, best of 3: 12.5 ms per loop
sage: G=nx.gnp_random_graph(60,0.5,directed=True)
sage: %timeit nx.pagerank(G)
10 loops, best of 3: 27 ms per loop
sage: %timeit nx.pagerank_scipy(G)
100 loops, best of 3: 3.47 ms per loop
sage: **%timeit nx.pagerank_numpy(G)
100 loops, best of 3: 3.13 ms per loop**
sage: G = graphs.RandomGNP(40, 0.2)
sage: n = G.networkx_graph()
sage: %timeit nx.pagerank_numpy(n)
1000 loops, best of 3: 1.06 ms per loop
sage: %timeit nx.pagerank_scipy(n)
The slowest run took 37.63 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 2.31 ms per loop
sage: %timeit nx.pagerank(n)
100 loops, best of 3: 11.3 ms per loop
sage: i = G.igraph_graph()
sage: %timeit i.pagerank()
10000 loops, best of 3: **23.3 µs** per loop
sage: G = graphs.RandomGNP(40, 0.8)
sage: n = G.networkx_graph()
sage: %timeit nx.pagerank_numpy(n)
1000 loops, best of 3: 1.33 ms per loop
sage: %timeit nx.pagerank_scipy(n)
100 loops, best of 3: 2.16 ms per loop
sage: %timeit nx.pagerank(n)
10 loops, best of 3: 20 ms per loop
sage: i = G.igraph_graph()
sage: %timeit i.pagerank()
10000 loops, best of 3: **39.1 µs** per loop
sage: G = graphs.RandomGNP(4000, 0.5)
sage: n = G.networkx_graph()
sage: i = G.igraph_graph()
sage: %timeit i.pagerank()
1 loop, best of 3: **1.03 s** per loop
sage: n = G.networkx_graph()
sage: %timeit nx.pagerank_numpy(n)
1 loop, best of 3: 1min 9s per loop
sage: %timeit nx.pagerank_scipy(n)
1 loop, best of 3: 7.12 s per loop
OK.
I have added the default values and completed the code.
We usually use vertex/vertices
and not node/nodes
.
- Return the PageRank of the nodes in the graph.
+ Return the PageRank of the vertices of ``self``.
- PageRank calculates the ranking of nodes in the graph G based on the
- structure of the incoming links. It is popularly used to rank web
- pages.
+ PageRank is a centrality measure first used to rank web pages.
+ The PageRank algorithm outputs the probability distribution that
+ a random walker in the graph visits a vertex.
+ if algorithm:
+ algorithm = algorithm.lower()
+ elif self.order <= 60:
+ algorithm = 'numpy'
+ else:
+ algorithm = 'scipy'
sage: import networkx
sage: G = graphs.CycleGraph(4)
sage: N = G.networkx_graph()
sage: print(networkx.pagerank(N))
{0: 0.25, 1: 0.25, 2: 0.25, 3: 0.25}
sage: N = (G+G).networkx_graph()
sage: print(networkx.pagerank(N))
{0: 0.125, 1: 0.125, 2: 0.125, 3: 0.125, 4: 0.125, 5: 0.125, 6: 0.125, 7: 0.125}
sage: H = graphs.CycleGraph(10)
sage: N = H.networkx_graph()
sage: print(networkx.pagerank(N))
{0: 0.1, 1: 0.1, 2: 0.1, 3: 0.1, 4: 0.1, 5: 0.1, 6: 0.1, 7: 0.1, 8: 0.1, 9: 0.1}
sage: N = (G+H).networkx_graph()
sage: print(networkx.pagerank(N))
{0: 0.07142857142857142, 1: 0.07142857142857142, 2: 0.07142857142857142, 3: 0.07142857142857142, 4: 0.07142857142857142, 5: 0.07142857142857142, 6: 0.07142857142857142, 7: 0.07142857142857142, 8: 0.07142857142857142, 9: 0.07142857142857142, 10: 0.07142857142857142, 11: 0.07142857142857142, 12: 0.07142857142857142, 13: 0.07142857142857142}
Branch pushed to git repo; I updated commit sha1. New commits:
552a429 | improved code |
optimizing the code for computing pagerank for connected components subgraph will work for undirected graphs as we can easily find the normalization factor but for digraphs the nodes can be dangling also in such a case we randomly connect that node to other nodes in our graph with some weight value of the edge. In such cases this optimization may not be desired.
it can also be let for a future ticket
For undirected graphs, I can do the optimization in this ticket only. :)
Even for undirected graphs with personalization parameter set the optimization may not work as can be seen from example below:
sage: g = Graph([(1,2,3),(2,3,5),(3,5,8),(2,5,13),(2,4,25)])
sage: personalization={}
sage: personalization[1] = 8
sage: personalization[2] = 16
sage: personalization[3] = 24
sage: personalization[4] = 36
sage: personalization[5] = 25
sage: g.pagerank(by_weight=True, personalization=personalization)
sage: personalization2={}
sage: personalization2[1] = 84
sage: personalization2[2] = 42
sage: personalization2[3] = 30
sage: personalization2[4] = 70
sage: f = Graph([(1,2,2),(2,3,15),(2,4,22)])
sage: f.pagerank(by_weight=True,personalization=personalization2)
{1: 0.07643670757326188,
2: 0.474528450109004,
3: 0.17504521830388875,
4: 0.27398962401384513}
sage: q = g + f
sage: personalization[0] = 8
sage: personalization[1] = 16
sage: personalization[2] = 24
sage: personalization[3] = 36
sage: personalization[4] = 25
sage: personalization[5] = 84
sage: personalization[6] = 42
sage: personalization[7] = 30
sage: personalization[8] = 70
sage: q.pagerank(by_weight=True,personalization=personalization)
{0: 0.010763853970772248,
1: 0.12955307566026533,
2: 0.04384414464203576,
3: 0.07596743980618663,
4: 0.0652489170000004,
5: 0.05156608491296348,
6: 0.3201286892373546,
7: 0.11808892042931564,
8: 0.18483887434110574}
So I guess its best to keep the method as it is for now.
;
. So - - ``alpha`` -- float (default: ``0.85``); Damping parameter for
+ - ``alpha`` -- float (default: ``0.85``); damping parameter for
you should give slightly more details on what the parameters are. For instance, the PageRank value for vertices without in-neighbors is 1 - damping
. Furthermore, the damping
factor (this term is better than parameter here) is the probability of resetting the random walk to a uniform distribution in each step.
As reported by the patchbot. Please check it
- .. SEEALSO:
+ .. SEEALSO::
- elif self.order <= 60:
+ elif self.order() <= 60:
.vertices()
when not necessary (it sorts vertices)- if by_weight:
- I = self.igraph_graph(edge_attrs={'weight': [weight_function(e)
- for e in self.edge_iterator()]})
- page_rank = I.pagerank(damping=alpha, weights='weight')
- return {v: page_rank[i] for i, v in enumerate(self.vertices())}
- else:
- I = self.igraph_graph()
- page_rank = I.pagerank(damping=alpha)
- return {v: page_rank[i] for i, v in enumerate(self.vertices())}
+ if by_weight:
+ I = self.igraph_graph(edge_attrs={'weight': [weight_function(e)
+ for e in self.edge_iterator()]})
+ else:
+ I = self.igraph_graph()
+ page_rank = I.pagerank(damping=alpha, weights=weight)
+ return {v: page_rank[i] for i, v in enumerate(self)}
raise NotImplementedError("Only 'NetworkX', 'Numpy', 'Scipy', and 'igraph' are supported")
I have added more info on damping parameter alpha and improved the code as per review.
on my system all tests pass! but why isn't patchbot reflecting the same here. I was just wondering about how much time it takes for patchbot to get invoked here?
$ ./sage -t src/sage/graphs/geneic_graph.py
Running doctests with ID 2019-04-07-20-11-50-05efbf4e.
Git branch: u/gh-rajat1433/27480_page_rank_algo
Using --optional=dochtml,igraph,memlimit,mpir,python2,python_igraph,sage
Doctesting 1 file.
sage -t --warn-long 24.3 src/sage/graphs/generic_graph.py
[3391 tests, 26.98 s]
----------------------------------------------------------------------
All tests passed!
----------------------------------------------------------------------
Total time for all tests: 28.0 seconds
cpu time: 19.8 seconds
cumulative wall time: 27.0 seconds
- :meth:`~GenericGraph.pagerank` | Return the PageRank of the nodes in the graph.
+ :meth:`~GenericGraph.pagerank` | Return the PageRank of the vertices of ``self``.
- Return the PageRank of the vertices in the graph.
+ Return the PageRank of the vertices of ``self``.
The documentation requires further improvements. I know that the documentations of networkx/igraph are not very clear, but we can do better.
personalization
parameter is used for. We can at least clarify the text and be closer to what is explained in networkx - - ``personalization`` -- dict (default: ``None``); the "personalization
- vector" consisting of a dictionary with a key for every graph node
- and nonzero personalization value for each node.
- By default, a uniform distribution is used.
+ - ``personalization`` -- dict (default: ``None``); a dictionary keyed by
+ vertices associating to each vertex a value. The personalization can be
+ specified for a subset of the vertices only, and the sum of the values
+ must be nonzero.
+ By default (``None``), a uniform distribution is used.
if you understand what is dangling
, please improve the description.
You should indicate the parameters that cannot be used with an algorithm or that can only be used with an algorithm.
may be it would be better to start the list of parameters with the algorithms ? Well, think about it.
the note Note that ``'networkx'`` does not support multigraphs.
could be added directly with the in the line - ``NetworkX`` --
...
you could add a TESTS
block with errors, specific cases, etc. For instance with personalization={0: 1, 1: -1}
Also, note that using a graph like a 4-cycle is interesting to illustrate the effect of the various parameters. Indeed, the pagerank of each vertex is 0.25. Note that the example you have is also interesting as it shows that the result is slightly different from an algorithm to another.
and for the patchbot, I don't know when it is invoked. It depends on availability and so on the number of users who are kindly offering computing resource for that and spending time maintaining the system.
Branch pushed to git repo; I updated commit sha1. New commits:
06dbcc6 | revert some changes |
I have rebased my branch on 8.8 beta 1 and did the changes as mentioned in comment 57.
Branch pushed to git repo; I updated commit sha1. New commits:
30d493d | remved unnecessary spaces |
Further improvements are certainly possible, but it's already good. Do at least this one (plurals)
- Parameter ``alpha``, ``by_weight`` and ``weight_function`` is common
- for all algorithms but ``personalization`` and ``dangling``
- parameters are used only in ``NetworkX``, ``Numpy`` and ``Scipy``
- implementations.
+ Parameters ``alpha``, ``by_weight`` and ``weight_function`` are common
+ to all algorithms. Parameters ``personalization`` and ``dangling``
+ are used only by algorithms ``NetworkX``, ``Numpy`` and ``Scipy``.
Branch pushed to git repo; I updated commit sha1. New commits:
804fe60 | improved note |
Done the changes!
LGTM.
Changed branch from u/gh-rajat1433/27480_page_rank_algo to 804fe60
Changed branch from 804fe60
to u/gh-rajat1433/27480_page_rank_algo
**********************************************************************
File "src/sage/graphs/generic_graph.py", line 9536, in sage.graphs.generic_graph.GenericGraph.?
Failed example:
G.pagerank(alpha=0.50, algorithm="igraph")
Exception raised:
Traceback (most recent call last):
File "/var/lib/buildbot/slave/sage_git/build/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 671, in _run
self.compile_and_execute(example, compiler, test.globs)
File "/var/lib/buildbot/slave/sage_git/build/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 1095, in compile_and_execute
exec(compiled, globs)
File "<doctest sage.graphs.generic_graph.GenericGraph.?[2]>", line 1, in <module>
G.pagerank(alpha=RealNumber('0.50'), algorithm="igraph")
File "/var/lib/buildbot/slave/sage_git/build/local/lib/python2.7/site-packages/sage/graphs/generic_graph.py", line 9668, in pagerank
raise PackageNotFoundError("igraph")
PackageNotFoundError: the package 'igraph' was not found. You can install it by running 'sage -i igraph' in a shell
**********************************************************************
File "src/sage/graphs/generic_graph.py", line 9581, in sage.graphs.generic_graph.GenericGraph.?
Failed example:
G.pagerank(algorithm="igraph")
Exception raised:
Traceback (most recent call last):
File "/var/lib/buildbot/slave/sage_git/build/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 671, in _run
self.compile_and_execute(example, compiler, test.globs)
File "/var/lib/buildbot/slave/sage_git/build/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 1095, in compile_and_execute
exec(compiled, globs)
File "<doctest sage.graphs.generic_graph.GenericGraph.?[10]>", line 1, in <module>
G.pagerank(algorithm="igraph")
File "/var/lib/buildbot/slave/sage_git/build/local/lib/python2.7/site-packages/sage/graphs/generic_graph.py", line 9668, in pagerank
raise PackageNotFoundError("igraph")
PackageNotFoundError: the package 'igraph' was not found. You can install it by running 'sage -i igraph' in a shell
**********************************************************************
1 item had failures:
2 of 890 in sage.graphs.generic_graph.GenericGraph.?
[3393 tests, 2 failures, 36.96 s]
**********************************************************************
@
Rajat: add to each doctest with igraph # optional - python_igraph
, like this
- sage: G.pagerank(alpha=0.50, algorithm="igraph")
+ sage: G.pagerank(alpha=0.50, algorithm="igraph") # optional - python_igraph
{0: 0.25, 1: 0.25, 2: 0.24999999999999997, 3: 0.24999999999999997}
and
- sage: G.pagerank(algorithm="igraph")
+ sage: G.pagerank(algorithm="igraph") # optional - python_igraph
@
Volker: sorry for this.
Branch pushed to git repo; I updated commit sha1. New commits:
f10629a | added optional igraph parameter |
Added the optional igraph thing. Understood the importance of it as it may fail tests on systems on which igraph is not installed..
$ ./sage -t src/sage/graphs/generic_graph.py
Running doctests with ID 2019-04-14-09-52-00-5ed75243.
Git branch: u/gh-rajat1433/27480_page_rank_algo
Using --optional=dochtml,igraph,memlimit,mpir,python2,python_igraph,sage
Doctesting 1 file.
sage -t --warn-long 24.3 src/sage/graphs/generic_graph.py
[3402 tests, 27.42 s]
----------------------------------------------------------------------
All tests passed!
----------------------------------------------------------------------
Total time for all tests: 28.3 seconds
cpu time: 19.0 seconds
cumulative wall time: 27.4 seconds
On my system all tests passes but patchbot has some failed examples don't know why...
This is numerical noise due to floating point arithmetic. See http://doc.sagemath.org/html/en/developer/coding_basics.html#special-markup-to-influence-doctests
A solution is to add # abs tol 1e-9
to all doctests.
- sage: G.pagerank(algorithm="Numpy")
+ sage: G.pagerank(algorithm="Numpy") # abs tol 1e-9
...
- sage: G.pagerank()
+ sage: G.pagerank() # abs tol 1e-9
also for igraph
- sage: G.pagerank(alpha=0.50, algorithm="igraph") # optional - python_igraph
+ sage: G.pagerank(alpha=0.50, algorithm="igraph") # optional - python_igraph # abs tol 1e-9
Branch pushed to git repo; I updated commit sha1. New commits:
07a884b | floating point flag included |
LGTM
Changed branch from u/gh-rajat1433/27480_page_rank_algo to 07a884b
There is an issue with igraph on the arando patchbot with 8.8.b4 and 8.8.b5:
**********************************************************************
File "src/sage/graphs/generic_graph.py", line 9663, in sage.graphs.generic_graph.GenericGraph.?
Failed example:
G.pagerank(alpha=0.50, algorithm="igraph") # optional - python_igraph
Expected:
{0: 0.25, 1: 0.25, 2: 0.24999999999999997, 3: 0.24999999999999997}
Got:
{0: 0.25, 1: 0.25, 2: 0.25, 3: 0.25}
**********************************************************************
1 item had failures:
1 of 860 in sage.graphs.generic_graph.GenericGraph.?
[3497 tests, 1 failure, 50.96 s]
----------------------------------------------------------------------
sage -t --long src/sage/graphs/generic_graph.py # 1 doctest failed
I think I forgot to add # abs tol 1e-9 to this test.
Thanks David for opening #27811 for fixing it
Replying to @fchapoton:
There is an issue with igraph on the arando patchbot with 8.8.b4 and 8.8.b5:
********************************************************************** File "src/sage/graphs/generic_graph.py", line 9663, in sage.graphs.generic_graph.GenericGraph.? Failed example: G.pagerank(alpha=0.50, algorithm="igraph") # optional - python_igraph Expected: {0: 0.25, 1: 0.25, 2: 0.24999999999999997, 3: 0.24999999999999997} Got: {0: 0.25, 1: 0.25, 2: 0.25, 3: 0.25} ********************************************************************** 1 item had failures: 1 of 860 in sage.graphs.generic_graph.GenericGraph.? [3497 tests, 1 failure, 50.96 s] ---------------------------------------------------------------------- sage -t --long src/sage/graphs/generic_graph.py # 1 doctest failed
Page Rank computes the ranking of the nodes of the graph based on the structure of the incoming links. It is a useful metrics in graphs and can be quite useful. (https://towardsdatascience.com/graphs-and-paths-pagerank-54f180a1aa0a)
Below is a link to a thesis on this algorithm http://www.sagemath.org/files/thesis/augeri-thesis-2008.pdf
It would be good to have its implementation in the Sage's graph module.
CC: @dcoudert
Component: graph theory
Keywords: pagerank
Author: Rajat Mittal
Branch:
07a884b
Reviewer: David Coudert
Issue created by migration from https://trac.sagemath.org/ticket/27480