sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.44k stars 480 forks source link

Page Rank algorithm implementation for Graphs #27480

Closed 2460b664-5ecd-43fc-9bbe-d0f333762988 closed 5 years ago

2460b664-5ecd-43fc-9bbe-d0f333762988 commented 5 years ago

Page Rank computes the ranking of the nodes of the graph based on the structure of the incoming links. It is a useful metrics in graphs and can be quite useful. (https://towardsdatascience.com/graphs-and-paths-pagerank-54f180a1aa0a)

Below is a link to a thesis on this algorithm http://www.sagemath.org/files/thesis/augeri-thesis-2008.pdf

It would be good to have its implementation in the Sage's graph module.

CC: @dcoudert

Component: graph theory

Keywords: pagerank

Author: Rajat Mittal

Branch: 07a884b

Reviewer: David Coudert

Issue created by migration from https://trac.sagemath.org/ticket/27480

2460b664-5ecd-43fc-9bbe-d0f333762988 commented 5 years ago
comment:2

I have a brief idea about it(used it in my college project) and can implement it in Sage modules.

2460b664-5ecd-43fc-9bbe-d0f333762988 commented 5 years ago
comment:3

I just found that networkx library already has Page Rank algorithm implemented in Python. Should I implement it in Sage freshly or should I make a function to use networkx implementation ? Or maybe we can write a faster cython implemetation of it. Need some suggetsions!

dcoudert commented 5 years ago
comment:4

networkx has several methods to compute Page rank: a pure Python, one using numpy, another using scipy, etc.

The optional package igraph also has an implementation of pagerank (see igraph documentation for pagerank (to install igraph, do sage -i igraph and sage -i python_igraph).

The best thing to do is to create a method including a parameter algorithm, and that will call the different algorithms. So algorithm could be None``,'networkx','numpy','scipy','igraph'`.

2460b664-5ecd-43fc-9bbe-d0f333762988 commented 5 years ago
comment:5

When it is None, should an implementation in sage be used? And if so should I do an implementation in python or cython?

dcoudert commented 5 years ago
comment:6

When it's None, the method should choose the best available implementation. And the best implementation could depend on the size or density of the (di)graph. For instance, for very large graphs, methods based on matrix multiplication might not be appropriate (memory consumption), while such methods might be very efficient for small graphs. Some measurements are needed to decide.

Before deciding if a new implementation is needed, we must know if what we can easily get is fast enough or not.

2460b664-5ecd-43fc-9bbe-d0f333762988 commented 5 years ago

Branch: u/gh-rajat1433/27480_page_rank

2460b664-5ecd-43fc-9bbe-d0f333762988 commented 5 years ago

Dependencies: #27496

2460b664-5ecd-43fc-9bbe-d0f333762988 commented 5 years ago
comment:9

I have installed igraph in my sage module. But however its pagerank algorithm is not able to work. It keeps on throwing a segmentation error. I use igraph_graph to convert sage graph into igraph. All other algorithms of igraph works fine. Following error message I get:

Cython backtrace
----------------

29  ../sysdeps/unix/sysv/linux/waitpid.c: No such file or directory.
Traceback (most recent call last):
  File "<string>", line 56, in <module>
  File "/usr/lib/python3/dist-packages/Cython/Debugger/libcython.py", line 689, in invoke
    for arg in string_to_argv(args):
TypeError: argument 1 must be str, not bytes
Saved trace to /home/rajat/.sage/crash_logs/crash_yLpPiW.log
------------------------------------------------------------------------
Unhandled SIGSEGV: A segmentation fault occurred.
This probably occurred because a *compiled* module has a bug
in it and is not properly wrapped with sig_on(), sig_off().
Python will now terminate.

Regarding networkx,numpy and scipy, I have tested them on their runtime. Numpy does it by matrix multiplication and solving for eigenvalues, networkx has pure python implementation and scipy does matrix multiplication iteratively.

Following are the runtimes I got:

sage: G=nx.gnp_random_graph(40,0.01,directed=True)
sage: %timeit nx.pagerank_numpy(G)
1000 loops, best of 3: 883 µs per loop
sage: %timeit nx.pagerank_scipy(G)
1000 loops, best of 3: 1.7 ms per loop
sage: %timeit nx.pagerank(G)
1000 loops, best of 3: 1.7 ms per loop

sage: G=nx.gnp_random_graph(60,0.01,directed=True)
sage: %timeit nx.pagerank(G)
100 loops, best of 3: 2.87 ms per loop
sage: %timeit nx.pagerank_scipy(G)
1000 loops, best of 3: 1.87 ms per loop
sage: %timeit nx.pagerank_numpy(G)
1000 loops, best of 3: 1.58 ms per loop

sage: G=nx.gnp_random_graph(70,0.01,directed=True)
sage: %timeit nx.pagerank(G)
100 loops, best of 3: 3.66 ms per loop
sage: %timeit nx.pagerank_numpy(G)
100 loops, best of 3: 2.22 ms per loop
sage: %timeit nx.pagerank_scipy(G)
100 loops, best of 3: 1.99 ms per loop

sage: G=nx.gnp_random_graph(80,0.01,directed=True)
sage: %timeit nx.pagerank(G)
100 loops, best of 3: 4.66 ms per loop
sage: %timeit nx.pagerank_scipy(G)
1000 loops, best of 3: 1.93 ms per loop
sage: %timeit nx.pagerank_numpy(G)
100 loops, best of 3: 3.3 ms per loop

sage: G=nx.gnp_random_graph(100,0.01,directed=True)
sage: %timeit nx.pagerank(G)
100 loops, best of 3: 7.07 ms per loop
sage: %timeit nx.pagerank_scipy(G)
100 loops, best of 3: 2.29 ms per loop
sage: %timeit nx.pagerank_numpy(G)
100 loops, best of 3: 4.63 ms per loop

sage: G=nx.gnp_random_graph(400,0.01,directed=True)
sage: %timeit nx.pagerank_numpy(G)
10 loops, best of 3: 175 ms per loop
sage: %timeit nx.pagerank_scipy(G)
100 loops, best of 3: 3.9 ms per loop
sage: %timeit nx.pagerank(G)
10 loops, best of 3: 53.2 ms per loop

sage: G=nx.gnp_random_graph(4000,0.01,directed=True)
sage: %timeit nx.pagerank(G)
1 loop, best of 3: 1.81 s per loop
sage: %timeit nx.pagerank_scipy(G)
1 loop, best of 3: 206 ms per loop
sage: %timeit nx.pagerank_numpy(G)
1 loop, best of 3: 1min 1s per loop

As per my analysis upto around 60 vertices numpy is fast but scipy is fastest after that.

dcoudert commented 5 years ago
comment:10

I don't know what's the problem with igraph. I have installed igraph and python_igraph on my OSX laptop and I can do

sage: G = graphs.RandomGNP(1000, .01)
sage: I = G.igraph_graph()
sage: %timeit I.pagerank()
100 loops, best of 3: 1.92 ms per loop

Try recompiling sage (make build or sage -b). If not working, you can ask for help on sage-devel. Explain what you did to install igraph, that other algorithms of igraph are working fine for you, and what's the error message you get.

dcoudert commented 5 years ago
comment:11

Note that a random graph on 40 nodes with probability 0.01 has in average 8 edges... so your experiments are not very conclusive on small graphs. Nonetheless, scipy seems the most efficient version.

2460b664-5ecd-43fc-9bbe-d0f333762988 commented 5 years ago
comment:12

More experiments with dense small graphs also suggest numpy to be best for small graphs

sage: G=nx.gnp_random_graph(40,0.5,directed=True)
sage:  **%timeit nx.pagerank_numpy(G)
The slowest run took 86.64 times longer than the fastest. This could mean that an intermediate result is being cached.
1 loop, best of 3: 2.32 ms per loop**
sage: %timeit nx.pagerank_scipy(G)
The slowest run took 60.81 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 2.41 ms per loop
sage: %timeit nx.pagerank(G)
100 loops, best of 3: 12.5 ms per loop

sage: G=nx.gnp_random_graph(60,0.5,directed=True)
sage: %timeit nx.pagerank(G)
10 loops, best of 3: 27 ms per loop
sage: %timeit nx.pagerank_scipy(G)
100 loops, best of 3: 3.47 ms per loop
sage: **%timeit nx.pagerank_numpy(G)
100 loops, best of 3: 3.13 ms per loop**
embray commented 5 years ago
comment:13

Ticket retargeted after milestone closed (if you don't believe this ticket is appropriate for the Sage 8.8 release please retarget manually)

2460b664-5ecd-43fc-9bbe-d0f333762988 commented 5 years ago

Changed dependencies from #27496 to #27496, #27502

2460b664-5ecd-43fc-9bbe-d0f333762988 commented 5 years ago
comment:15

Page rank for Ipython with weighted edges doesn't seem to be working: However there is a parameter called weights in Pagerank method of Igraph , I looked at its code and doc but it is not clear to me if it is using it.

https://github.com/igraph/python-igraph/blob/master/src/graphobject.c

https://igraph.org/python/doc/igraph.GraphBase-class.html#personalized_pagerank

sage: G = Graph(6)
sage: I = G.igraph_graph()
sage: I.add_edge(0,0,weight=40)
igraph.Edge(<igraph.Graph object at 0x7f338814e148>, 0, {'weight': 40})
sage: I.add_edge(1,2,weight=50)
igraph.Edge(<igraph.Graph object at 0x7f338814e148>, 1, {'weight': 50})
sage: I.add_edge(2,3,weight=60)
igraph.Edge(<igraph.Graph object at 0x7f338814e148>, 2, {'weight': 60})
sage: I.add_edge(0,3,weight=70)
igraph.Edge(<igraph.Graph object at 0x7f338814e148>, 3, {'weight': 70})
sage: I.add_edge(3,4,weight=80)
igraph.Edge(<igraph.Graph object at 0x7f338814e148>, 4, {'weight': 80})
sage: I.add_edge(4,5,weight=20)
igraph.Edge(<igraph.Graph object at 0x7f338814e148>, 5, {'weight': 20})
sage: I.pagerank(weights='weight')
[0.1380494948975056,
 0.11517272680482316,
 0.19683228912731204,
 0.237940473238224,
 0.196832289127312,
 0.11517272680482313]
sage: I.pagerank()
[0.1380494948975056,
 0.11517272680482316,
 0.19683228912731204,
 0.237940473238224,
 0.196832289127312,
 0.11517272680482313]
2460b664-5ecd-43fc-9bbe-d0f333762988 commented 5 years ago
comment:16

Following experiments show igraph's results to be the best due to its c-language code.

sage: G = graphs.RandomGNP(40, 0.2)
sage: n = G.networkx_graph()
sage: %timeit nx.pagerank_numpy(n)
1000 loops, best of 3: 1.06 ms per loop
sage: %timeit nx.pagerank_scipy(n)
The slowest run took 37.63 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 2.31 ms per loop
sage:  %timeit nx.pagerank(n)
100 loops, best of 3: 11.3 ms per loop
sage: i = G.igraph_graph()
sage: %timeit i.pagerank()
10000 loops, best of 3: **23.3 µs** per loop

sage: G = graphs.RandomGNP(40, 0.8)
sage: n = G.networkx_graph()
sage: %timeit nx.pagerank_numpy(n)
1000 loops, best of 3: 1.33 ms per loop
sage: %timeit nx.pagerank_scipy(n)
100 loops, best of 3: 2.16 ms per loop
sage:  %timeit nx.pagerank(n)
10 loops, best of 3: 20 ms per loop
sage: i = G.igraph_graph()
sage: %timeit i.pagerank()
10000 loops, best of 3: **39.1 µs** per loop

sage: G = graphs.RandomGNP(4000, 0.5)
sage: n = G.networkx_graph()
sage: i = G.igraph_graph()
sage: %timeit i.pagerank()
1 loop, best of 3: **1.03 s** per loop
sage: n = G.networkx_graph()
sage: %timeit nx.pagerank_numpy(n)
1 loop, best of 3: 1min 9s per loop
sage: %timeit nx.pagerank_scipy(n)
1 loop, best of 3: 7.12 s per loop
sage:  %timeit nx.pagerank(n)
....: 
....: 
Killed
dcoudert commented 5 years ago
comment:17

The code of the pagerank algorithm is here https://github.com/igraph/igraph/blob/master/src/centrality.c

2460b664-5ecd-43fc-9bbe-d0f333762988 commented 5 years ago

Changed branch from u/gh-rajat1433/27480_page_rank to u/gh-rajat1433/27480_page_rank_implementation

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 5 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

b2c82ecFixes cycle basis for multiedges
1a68ce1Add doctests
142e50fMerge branch 'develop' of git://trac.sagemath.org/sage into develop
8836f0dcheckpoint1
a2f38d9merged
8433c2amerged
b57b730pagerank method
ecbb4a7removed xtra lines
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 5 years ago

Commit: ecbb4a7

2460b664-5ecd-43fc-9bbe-d0f333762988 commented 5 years ago

Changed branch from u/gh-rajat1433/27480_page_rank_implementation to u/gh-rajat1433/27480_page_rank_algo

2460b664-5ecd-43fc-9bbe-d0f333762988 commented 5 years ago

Changed commit from ecbb4a7 to none

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 5 years ago

Commit: 142e50f

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 5 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

b2c82ecFixes cycle basis for multiedges
1a68ce1Add doctests
142e50fMerge branch 'develop' of git://trac.sagemath.org/sage into develop
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 5 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

faafb79branch corrected
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 5 years ago

Changed commit from 142e50f to faafb79

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 5 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

cdef286fix small mistakes
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 5 years ago

Changed commit from faafb79 to cdef286

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 5 years ago

Changed commit from cdef286 to b4e6b8c

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 5 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

b4e6b8cremoving spaces
2460b664-5ecd-43fc-9bbe-d0f333762988 commented 5 years ago
comment:25

Sorry for the mess. Its due to update in the sage version now its corrected :)

2460b664-5ecd-43fc-9bbe-d0f333762988 commented 5 years ago
comment:26

I think that this is an issue on igraph side that its not able to work for weighted graphs in python. https://lists.nongnu.org/archive/html/igraph-help/2008-08/msg00030.html

Still I am seeing through how the code of igraph can be fixed to take weights in pagerank. Else we can use other libraries like scipy for weighted case.

2460b664-5ecd-43fc-9bbe-d0f333762988 commented 5 years ago
comment:27

I think the problem is with the version or codebase of igraph which sage currently uses , maybe its not been updated.

Using sage:

sage: G = Graph(5)
sage: I = G.igraph_graph()
sage: I.add_edge(0,1,weight=50)
igraph.Edge(<igraph.Graph object at 0x7fe144ca7a00>, 0, {'weight': 50})
sage: I.add_edge(1,2,weight=70)
....: 
igraph.Edge(<igraph.Graph object at 0x7fe144ca7a00>, 1, {'weight': 70})
sage: I.add_edge(2,4,weight=35)
igraph.Edge(<igraph.Graph object at 0x7fe144ca7a00>, 2, {'weight': 35})
sage: 
....: 
sage: I.add_edge(3,4,weight=35)
igraph.Edge(<igraph.Graph object at 0x7fe144ca7a00>, 3, {'weight': 35})
sage: I.pagerank()
[0.134527027027027,
 0.24594594594594593,
 0.23905405405405405,
 0.134527027027027,
 0.24594594594594593]
sage: I.pagerank(weights='weight')
[0.134527027027027,
 0.24594594594594593,
 0.23905405405405405,
 0.134527027027027,
 0.24594594594594593]

Using python terminal

>>> I = Graph(5)
>>> I
<igraph.Graph object at 0x7f5bcd250528>
>>> I.summary()
'IGRAPH U--- 5 0 -- '
>>> I.add_edge(0,1,weight=50)
>>> I.add_edge(1,2,weight=70)
>>> I.add_edge(2,4,weight=35)
>>> I.add_edge(3,4,weight=35)
>>> I.pagerank()
[0.134527027027027, 0.24594594594594593, 0.23905405405405405, 0.134527027027027, 0.24594594594594593]
>>> I.pagerank(weights='weight')
[0.1326579757790889, 0.2898578139644863, 0.2595856492098718, 0.11586448311914739, 0.20203407792740563]
dcoudert commented 5 years ago
comment:28

Sage 8.8.beta0 has been released and it includes the 2 tickets marked as dependencies. So you can update you develop branch and rebuild this ticket using last beta.

If it is not possible to use weight with igraph, add a TODO block in the method to say that it should be done someday.

2460b664-5ecd-43fc-9bbe-d0f333762988 commented 5 years ago
comment:29

I have already updated to sage8.8 beta0 today only!

Please read comment 27, as I am able to use weight parameter of igraph in my python console but in sage its not working (don't know why this discrepancy is present).

2460b664-5ecd-43fc-9bbe-d0f333762988 commented 5 years ago

Changed dependencies from #27496, #27502 to none

2460b664-5ecd-43fc-9bbe-d0f333762988 commented 5 years ago
comment:31

Should I open a new ticket mentioning this discrepancy of igraph package in sage?

dcoudert commented 5 years ago
comment:32

I don't understand what's going on. Have you tried other methods using weights?

2460b664-5ecd-43fc-9bbe-d0f333762988 commented 5 years ago
comment:33

Yes other algorithms like dijkstra are working fine with weights. I think its an old bug of igraph which is updated as can be seen by the results in my python shell however its not been updated in sage version of igraph.

Reference: https://lists.nongnu.org/archive/html/igraph-help/2008-08/msg00030.html

dcoudert commented 5 years ago
comment:34

May the version that has been updated in #27502 is not the same as the version you have on your computer ? upstream should update python_igraph to be at least Python3 compatible. It currently contains deprecated stuff, so don't pass python3 tests :(

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 5 years ago

Changed commit from b4e6b8c to 78b1f8b

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 5 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

46051b4remove spaces
566f979indenting
78b1f8bimproved code
2460b664-5ecd-43fc-9bbe-d0f333762988 commented 5 years ago

Changed keywords from none to pagerank

2460b664-5ecd-43fc-9bbe-d0f333762988 commented 5 years ago

Author: Rajat Mittal

2460b664-5ecd-43fc-9bbe-d0f333762988 commented 5 years ago

Reviewer: David Coudert

2460b664-5ecd-43fc-9bbe-d0f333762988 commented 5 years ago
comment:37

I have used scipy in case of weighted graphs and igraph in case of unweighted as default case as per the results I got mentioned in the previous comments.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 5 years ago

Changed commit from 78b1f8b to 112cac9

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 5 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

112cac9correct igraph example
dcoudert commented 5 years ago
comment:39

Replying to @rajat1433:

I have used scipy in case of weighted graphs and igraph in case of unweighted as default case as per the results I got mentioned in the previous comments.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 5 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

e7ae5daimproved the code
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 5 years ago

Changed commit from 112cac9 to e7ae5da