neo4j-contrib / neo4j-graph-algorithms

Efficient Graph Algorithms for Neo4j
https://github.com/neo4j/graph-data-science/
GNU General Public License v3.0
772 stars 194 forks source link

very different outputs of apoc page-rank and new page-rank? #89

Open jexp opened 7 years ago

jexp commented 7 years ago

Like 2 orders of magnitude different score values?

CALL algo.pageRankStream('Person','KNOWS') yield node, score
> WITH * order by score DESC LIMIT 10
> RETURN id(node), node.firstName, node.lastName, score;
+-------------------------------------------------------------------+
| id(node) | node.firstName | node.lastName | score                 |
+-------------------------------------------------------------------+
| 2018160  | "Chen"         | "Yang"        | 0.0035972294696054335 |
| 2025347  | "Wei"          | "Wang"        | 0.00338002511591596   |
| 2016633  | "Rudolf"       | "Hoffmann"    | 0.003266771041832335  |
| 2024012  | "Rafael"       | "Suissa"      | 0.0032483767089390176 |
| 2024437  | "Zhang"        | "Hu"          | 0.002917721858609681  |
| 2020592  | "Paul"         | "Bach"        | 0.0026936801886415437 |
| 2017530  | "Yang"         | "Li"          | 0.002374719927258027  |
| 2017368  | "Karl"         | "Fischer"     | 0.002339354242148342  |
| 2020038  | "Wei"          | "Zhu"         | 0.002295584953345968  |
| 2022831  | "Lei"          | "Xu"          | 0.002257252684833366  |
+-------------------------------------------------------------------+
10 rows
172 ms
neo4j-sh (?)$ 
neo4j-sh (?)$ MATCH (p:Person) WITH collect(p) as nodes
> CALL apoc.algo.pageRankWithConfig(nodes,{iterations:20,types:'KNOWS'}) YIELD node, score
> WITH * order by score DESC LIMIT 10
> RETURN id(node), node.firstName, node.lastName, score;
+------------------------------------------------------+
| id(node) | node.firstName  | node.lastName | score   |
+------------------------------------------------------+
| 2019080  | "Anucha"        | "Charoenpura" | 0.48365 |
| 2016852  | "Bingbing"      | "Liu"         | 0.47604 |
| 2022828  | "Javier"        | "Gonzalez"    | 0.47339 |
| 2022593  | "Mohamed"       | "Achiou"      | 0.46651 |
| 2017530  | "Yang"          | "Li"          | 0.45797 |
| 2020196  | "Eve-Mary Thai" | "Hoang"       | 0.45622 |
| 2017641  | "Alexander"     | "Hleb"        | 0.43794 |
| 2016164  | "Avraham"       | "Alhouthi"    | 0.43247 |
| 2018520  | "Jun"           | "Zhang"       | 0.42199 |
| 2020288  | "Yang"          | "Lei"         | 0.41781 |
+------------------------------------------------------+
10 rows
32118 ms
mknblch commented 7 years ago

There are several variations of PageRank. We implemented PR = ( 1 - d ) / N + d * sum ( PR_v / outDegree_v ). We should check both implementations.

jexp commented 7 years ago

Afaik that's the same that the other page-rank implements (the textbook one). Yes, please check.

mknblch commented 7 years ago

I reviewed our pageRank implementation and noticed a possible bug which might lead to wrong results. We work on the pageRank array while we doing writes on it in one iteration. We should create an empty buffer for the new ranks while reading on the old ranks from the previous iteration.

jexp commented 7 years ago

Cool, please send the PR when you get it fixed.

knutwalker commented 7 years ago

possibly fixed in #174

knutwalker commented 7 years ago

@tomasonjo can you please verify whether this mismatch still happens?