Closed 2460b664-5ecd-43fc-9bbe-d0f333762988 closed 5 years ago
I have a brief idea about it(used it in my college project) and can implement it in Sage modules.
I just found that networkx library already has Page Rank algorithm implemented in Python. Should I implement it in Sage freshly or should I make a function to use networkx implementation ? Or maybe we can write a faster cython implemetation of it. Need some suggetsions!
networkx has several methods to compute Page rank: a pure Python, one using numpy, another using scipy, etc.
The optional package igraph
also has an implementation of pagerank (see
igraph documentation for pagerank
(to install igraph, do sage -i igraph
and sage -i python_igraph
).
The best thing to do is to create a method including a parameter algorithm
, and that will call the different algorithms. So algorithm
could be None``,
'networkx',
'numpy',
'scipy',
'igraph'`.
When it is None, should an implementation in sage be used? And if so should I do an implementation in python or cython?
When it's None
, the method should choose the best available implementation. And the best implementation could depend on the size or density of the (di)graph. For instance, for very large graphs, methods based on matrix multiplication might not be appropriate (memory consumption), while such methods might be very efficient for small graphs. Some measurements are needed to decide.
Before deciding if a new implementation is needed, we must know if what we can easily get is fast enough or not.
Branch: u/gh-rajat1433/27480_page_rank
Dependencies: #27496
I have installed igraph in my sage module. But however its pagerank algorithm is not able to work. It keeps on throwing a segmentation error. I use igraph_graph to convert sage graph into igraph. All other algorithms of igraph works fine. Following error message I get:
Cython backtrace
----------------
29 ../sysdeps/unix/sysv/linux/waitpid.c: No such file or directory.
Traceback (most recent call last):
File "<string>", line 56, in <module>
File "/usr/lib/python3/dist-packages/Cython/Debugger/libcython.py", line 689, in invoke
for arg in string_to_argv(args):
TypeError: argument 1 must be str, not bytes
Saved trace to /home/rajat/.sage/crash_logs/crash_yLpPiW.log
------------------------------------------------------------------------
Unhandled SIGSEGV: A segmentation fault occurred.
This probably occurred because a *compiled* module has a bug
in it and is not properly wrapped with sig_on(), sig_off().
Python will now terminate.
Regarding networkx,numpy and scipy, I have tested them on their runtime. Numpy does it by matrix multiplication and solving for eigenvalues, networkx has pure python implementation and scipy does matrix multiplication iteratively.
Following are the runtimes I got:
sage: G=nx.gnp_random_graph(40,0.01,directed=True)
sage: %timeit nx.pagerank_numpy(G)
1000 loops, best of 3: 883 µs per loop
sage: %timeit nx.pagerank_scipy(G)
1000 loops, best of 3: 1.7 ms per loop
sage: %timeit nx.pagerank(G)
1000 loops, best of 3: 1.7 ms per loop
sage: G=nx.gnp_random_graph(60,0.01,directed=True)
sage: %timeit nx.pagerank(G)
100 loops, best of 3: 2.87 ms per loop
sage: %timeit nx.pagerank_scipy(G)
1000 loops, best of 3: 1.87 ms per loop
sage: %timeit nx.pagerank_numpy(G)
1000 loops, best of 3: 1.58 ms per loop
sage: G=nx.gnp_random_graph(70,0.01,directed=True)
sage: %timeit nx.pagerank(G)
100 loops, best of 3: 3.66 ms per loop
sage: %timeit nx.pagerank_numpy(G)
100 loops, best of 3: 2.22 ms per loop
sage: %timeit nx.pagerank_scipy(G)
100 loops, best of 3: 1.99 ms per loop
sage: G=nx.gnp_random_graph(80,0.01,directed=True)
sage: %timeit nx.pagerank(G)
100 loops, best of 3: 4.66 ms per loop
sage: %timeit nx.pagerank_scipy(G)
1000 loops, best of 3: 1.93 ms per loop
sage: %timeit nx.pagerank_numpy(G)
100 loops, best of 3: 3.3 ms per loop
sage: G=nx.gnp_random_graph(100,0.01,directed=True)
sage: %timeit nx.pagerank(G)
100 loops, best of 3: 7.07 ms per loop
sage: %timeit nx.pagerank_scipy(G)
100 loops, best of 3: 2.29 ms per loop
sage: %timeit nx.pagerank_numpy(G)
100 loops, best of 3: 4.63 ms per loop
sage: G=nx.gnp_random_graph(400,0.01,directed=True)
sage: %timeit nx.pagerank_numpy(G)
10 loops, best of 3: 175 ms per loop
sage: %timeit nx.pagerank_scipy(G)
100 loops, best of 3: 3.9 ms per loop
sage: %timeit nx.pagerank(G)
10 loops, best of 3: 53.2 ms per loop
sage: G=nx.gnp_random_graph(4000,0.01,directed=True)
sage: %timeit nx.pagerank(G)
1 loop, best of 3: 1.81 s per loop
sage: %timeit nx.pagerank_scipy(G)
1 loop, best of 3: 206 ms per loop
sage: %timeit nx.pagerank_numpy(G)
1 loop, best of 3: 1min 1s per loop
As per my analysis upto around 60 vertices numpy is fast but scipy is fastest after that.
I don't know what's the problem with igraph. I have installed igraph
and python_igraph
on my OSX laptop and I can do
sage: G = graphs.RandomGNP(1000, .01)
sage: I = G.igraph_graph()
sage: %timeit I.pagerank()
100 loops, best of 3: 1.92 ms per loop
Try recompiling sage (make build
or sage -b
). If not working, you can ask for help on sage-devel. Explain what you did to install igraph, that other algorithms of igraph are working fine for you, and what's the error message you get.
Note that a random graph on 40 nodes with probability 0.01 has in average 8 edges... so your experiments are not very conclusive on small graphs. Nonetheless, scipy
seems the most efficient version.
More experiments with dense small graphs also suggest numpy to be best for small graphs
sage: G=nx.gnp_random_graph(40,0.5,directed=True)
sage: **%timeit nx.pagerank_numpy(G)
The slowest run took 86.64 times longer than the fastest. This could mean that an intermediate result is being cached.
1 loop, best of 3: 2.32 ms per loop**
sage: %timeit nx.pagerank_scipy(G)
The slowest run took 60.81 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 2.41 ms per loop
sage: %timeit nx.pagerank(G)
100 loops, best of 3: 12.5 ms per loop
sage: G=nx.gnp_random_graph(60,0.5,directed=True)
sage: %timeit nx.pagerank(G)
10 loops, best of 3: 27 ms per loop
sage: %timeit nx.pagerank_scipy(G)
100 loops, best of 3: 3.47 ms per loop
sage: **%timeit nx.pagerank_numpy(G)
100 loops, best of 3: 3.13 ms per loop**
Ticket retargeted after milestone closed (if you don't believe this ticket is appropriate for the Sage 8.8 release please retarget manually)
Changed dependencies from #27496 to #27496, #27502
Page rank for Ipython with weighted edges doesn't seem to be working: However there is a parameter called weights in Pagerank method of Igraph , I looked at its code and doc but it is not clear to me if it is using it.
https://github.com/igraph/python-igraph/blob/master/src/graphobject.c
https://igraph.org/python/doc/igraph.GraphBase-class.html#personalized_pagerank
sage: G = Graph(6)
sage: I = G.igraph_graph()
sage: I.add_edge(0,0,weight=40)
igraph.Edge(<igraph.Graph object at 0x7f338814e148>, 0, {'weight': 40})
sage: I.add_edge(1,2,weight=50)
igraph.Edge(<igraph.Graph object at 0x7f338814e148>, 1, {'weight': 50})
sage: I.add_edge(2,3,weight=60)
igraph.Edge(<igraph.Graph object at 0x7f338814e148>, 2, {'weight': 60})
sage: I.add_edge(0,3,weight=70)
igraph.Edge(<igraph.Graph object at 0x7f338814e148>, 3, {'weight': 70})
sage: I.add_edge(3,4,weight=80)
igraph.Edge(<igraph.Graph object at 0x7f338814e148>, 4, {'weight': 80})
sage: I.add_edge(4,5,weight=20)
igraph.Edge(<igraph.Graph object at 0x7f338814e148>, 5, {'weight': 20})
sage: I.pagerank(weights='weight')
[0.1380494948975056,
0.11517272680482316,
0.19683228912731204,
0.237940473238224,
0.196832289127312,
0.11517272680482313]
sage: I.pagerank()
[0.1380494948975056,
0.11517272680482316,
0.19683228912731204,
0.237940473238224,
0.196832289127312,
0.11517272680482313]
Following experiments show igraph's results to be the best due to its c-language code.
sage: G = graphs.RandomGNP(40, 0.2)
sage: n = G.networkx_graph()
sage: %timeit nx.pagerank_numpy(n)
1000 loops, best of 3: 1.06 ms per loop
sage: %timeit nx.pagerank_scipy(n)
The slowest run took 37.63 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 2.31 ms per loop
sage: %timeit nx.pagerank(n)
100 loops, best of 3: 11.3 ms per loop
sage: i = G.igraph_graph()
sage: %timeit i.pagerank()
10000 loops, best of 3: **23.3 µs** per loop
sage: G = graphs.RandomGNP(40, 0.8)
sage: n = G.networkx_graph()
sage: %timeit nx.pagerank_numpy(n)
1000 loops, best of 3: 1.33 ms per loop
sage: %timeit nx.pagerank_scipy(n)
100 loops, best of 3: 2.16 ms per loop
sage: %timeit nx.pagerank(n)
10 loops, best of 3: 20 ms per loop
sage: i = G.igraph_graph()
sage: %timeit i.pagerank()
10000 loops, best of 3: **39.1 µs** per loop
sage: G = graphs.RandomGNP(4000, 0.5)
sage: n = G.networkx_graph()
sage: i = G.igraph_graph()
sage: %timeit i.pagerank()
1 loop, best of 3: **1.03 s** per loop
sage: n = G.networkx_graph()
sage: %timeit nx.pagerank_numpy(n)
1 loop, best of 3: 1min 9s per loop
sage: %timeit nx.pagerank_scipy(n)
1 loop, best of 3: 7.12 s per loop
sage: %timeit nx.pagerank(n)
....:
....:
Killed
The code of the pagerank algorithm is here https://github.com/igraph/igraph/blob/master/src/centrality.c
Changed branch from u/gh-rajat1433/27480_page_rank to u/gh-rajat1433/27480_page_rank_implementation
Commit: ecbb4a7
Changed branch from u/gh-rajat1433/27480_page_rank_implementation to u/gh-rajat1433/27480_page_rank_algo
Changed commit from ecbb4a7
to none
Commit: 142e50f
Branch pushed to git repo; I updated commit sha1. New commits:
faafb79 | branch corrected |
Branch pushed to git repo; I updated commit sha1. New commits:
cdef286 | fix small mistakes |
Branch pushed to git repo; I updated commit sha1. New commits:
b4e6b8c | removing spaces |
Sorry for the mess. Its due to update in the sage version now its corrected :)
I think that this is an issue on igraph side that its not able to work for weighted graphs in python. https://lists.nongnu.org/archive/html/igraph-help/2008-08/msg00030.html
Still I am seeing through how the code of igraph can be fixed to take weights in pagerank. Else we can use other libraries like scipy for weighted case.
I think the problem is with the version or codebase of igraph which sage currently uses , maybe its not been updated.
Using sage:
sage: G = Graph(5)
sage: I = G.igraph_graph()
sage: I.add_edge(0,1,weight=50)
igraph.Edge(<igraph.Graph object at 0x7fe144ca7a00>, 0, {'weight': 50})
sage: I.add_edge(1,2,weight=70)
....:
igraph.Edge(<igraph.Graph object at 0x7fe144ca7a00>, 1, {'weight': 70})
sage: I.add_edge(2,4,weight=35)
igraph.Edge(<igraph.Graph object at 0x7fe144ca7a00>, 2, {'weight': 35})
sage:
....:
sage: I.add_edge(3,4,weight=35)
igraph.Edge(<igraph.Graph object at 0x7fe144ca7a00>, 3, {'weight': 35})
sage: I.pagerank()
[0.134527027027027,
0.24594594594594593,
0.23905405405405405,
0.134527027027027,
0.24594594594594593]
sage: I.pagerank(weights='weight')
[0.134527027027027,
0.24594594594594593,
0.23905405405405405,
0.134527027027027,
0.24594594594594593]
Using python terminal
>>> I = Graph(5)
>>> I
<igraph.Graph object at 0x7f5bcd250528>
>>> I.summary()
'IGRAPH U--- 5 0 -- '
>>> I.add_edge(0,1,weight=50)
>>> I.add_edge(1,2,weight=70)
>>> I.add_edge(2,4,weight=35)
>>> I.add_edge(3,4,weight=35)
>>> I.pagerank()
[0.134527027027027, 0.24594594594594593, 0.23905405405405405, 0.134527027027027, 0.24594594594594593]
>>> I.pagerank(weights='weight')
[0.1326579757790889, 0.2898578139644863, 0.2595856492098718, 0.11586448311914739, 0.20203407792740563]
Sage 8.8.beta0 has been released and it includes the 2 tickets marked as dependencies. So you can update you develop branch and rebuild this ticket using last beta.
If it is not possible to use weight with igraph, add a TODO block in the method to say that it should be done someday.
I have already updated to sage8.8 beta0 today only!
Please read comment 27, as I am able to use weight parameter of igraph in my python console but in sage its not working (don't know why this discrepancy is present).
Changed dependencies from #27496, #27502 to none
Should I open a new ticket mentioning this discrepancy of igraph package in sage?
I don't understand what's going on. Have you tried other methods using weights?
Yes other algorithms like dijkstra are working fine with weights. I think its an old bug of igraph which is updated as can be seen by the results in my python shell however its not been updated in sage version of igraph.
Reference: https://lists.nongnu.org/archive/html/igraph-help/2008-08/msg00030.html
May the version that has been updated in #27502 is not the same as the version you have on your computer ? upstream should update python_igraph to be at least Python3 compatible. It currently contains deprecated stuff, so don't pass python3 tests :(
Changed keywords from none to pagerank
Author: Rajat Mittal
Reviewer: David Coudert
I have used scipy in case of weighted graphs and igraph in case of unweighted as default case as per the results I got mentioned in the previous comments.
Branch pushed to git repo; I updated commit sha1. New commits:
112cac9 | correct igraph example |
Replying to @rajat1433:
I have used scipy in case of weighted graphs and igraph in case of unweighted as default case as per the results I got mentioned in the previous comments.
Unfortunately, igraph is an optional package, so you can use it only if installed. See e.g. clique_maximum
to know how to check if a package is installed.
Also, I suggest you start checking if default should be used, and then select most suitable default algorithm.
instead of implementation
, you should use algorithm
.
You should raise an error if the given name is unknown
the name of igraph is igraph
, so don't use Igraph
. In fact, you could do algorithm = algorithm.lower()
to avoid errors.
if self.order() == 0:
-> if not self.order():
this is not needed import igraph
Branch pushed to git repo; I updated commit sha1. New commits:
e7ae5da | improved the code |
Page Rank computes the ranking of the nodes of the graph based on the structure of the incoming links. It is a useful metrics in graphs and can be quite useful. (https://towardsdatascience.com/graphs-and-paths-pagerank-54f180a1aa0a)
Below is a link to a thesis on this algorithm http://www.sagemath.org/files/thesis/augeri-thesis-2008.pdf
It would be good to have its implementation in the Sage's graph module.
CC: @dcoudert
Component: graph theory
Keywords: pagerank
Author: Rajat Mittal
Branch:
07a884b
Reviewer: David Coudert
Issue created by migration from https://trac.sagemath.org/ticket/27480