from my point of view your implementation of the simplified PageRank algorithm does not follow the protocol outlined in the book. I only have the first edition at hand, where it says:
There is a total of 1.0 PageRank in the network.
this should be true even at the end of the calculation, but is violated by your implementation. The total PageRank at the end of the calculation in your script amounts to 1.0425
Initially this PageRank is equally distributed among nodes.
At each step, a large fraction of each node's PageRank is distributed evenly among its outgoing links.
At each step, the remainder of each node's PageRank is distributed evenly among all nodes.
This point is missing in your implementation
I am proposing a version which implements point 4 in a straight forward way. There might be more elegant ways, but this one is easy to understand.
The results get numerically very close to the implementation in networkX. See numbers below.
NetworkX results
import networkx as nx
G = nx.DiGraph()
G.add_nodes_from([user.id for user in users])
G.add_edges_from(endorsements)
pr_nx=nx.pagerank(G, 0.85)
Hi Joel,
from my point of view your implementation of the simplified PageRank algorithm does not follow the protocol outlined in the book. I only have the first edition at hand, where it says:
There is a total of 1.0 PageRank in the network. this should be true even at the end of the calculation, but is violated by your implementation. The total PageRank at the end of the calculation in your script amounts to 1.0425
Initially this PageRank is equally distributed among nodes.
At each step, a large fraction of each node's PageRank is distributed evenly among its outgoing links.
At each step, the remainder of each node's PageRank is distributed evenly among all nodes. This point is missing in your implementation
I am proposing a version which implements point 4 in a straight forward way. There might be more elegant ways, but this one is easy to understand. The results get numerically very close to the implementation in networkX. See numbers below.
Original results user id: PageRank 0: 0.1, 1: 0.1, 2: 0.1, 3: 0.1, 4: 0.14250000000000002, 5: 0.1, 6: 0.1, 7: 0.1, 8: 0.1, 9: 0.1
NetworkX results import networkx as nx G = nx.DiGraph() G.add_nodes_from([user.id for user in users]) G.add_edges_from(endorsements) pr_nx=nx.pagerank(G, 0.85)
user id: PageRank 0: 0.09499151348469306, 1: 0.10547758964858775, 2: 0.10547758964858775, 3: 0.09499151348469306, 4: 0.1593177423515437, 5: 0.10200959185661473, 6: 0.07857495588955458, 7: 0.07857495588955458, 8: 0.10200959185661472, 9: 0.07857495588955458
New results 0: 0.0949906958425375, 1: 0.10547659652084887, 2: 0.10547659652084887, 3: 0.0949906958425375, 4: 0.1593168333463994, 5: 0.10201123958329422, 6: 0.07857536758674652, 7: 0.07857536758674652, 8: 0.10201123958329422, 9: 0.07857536758674652