neo4j-contrib / neo4j-graph-algorithms

Efficient Graph Algorithms for Neo4j
https://github.com/neo4j/graph-data-science/
GNU General Public License v3.0
769 stars 195 forks source link

Use floats instead of scaled ints for PR score #872

Closed knutwalker closed 5 years ago

knutwalker commented 5 years ago

Potential solution for #861

Benchmarks show a decrease in performance of 0.5 to 1%, well within the error margin.

heavy-pr-diff-1 huge-pr-diff-1

Benchmarks were executed as

java -jar benchmark/target/benchmark.jar -rf json -rff page-rank.json -gc true -prof gc org.neo4j.graphalgo.bench.PageRankBenchmarkLdbc.*

Benchmark Json files:

bechmark-files.zip

jexp commented 5 years ago

On a larger dataset it shows more impact:

call apoc.warmup.run();
call apoc.warmup.run();
call algo.graph.load('db');
call algo.pageRank(null,null,{graph:'db',write:false});
call algo.pageRank(null,null,{graph:'db',write:false});
call algo.pageRank(null,null,{graph:'db',write:false});

3.4.0.9

with int:

neo4j-sh (?)$ call algo.graph.load('db');
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| name | graph   | direction  | undirected | sorted | nodes    | loadMillis | alreadyLoaded | nodeWeight | relationshipWeight | nodeProperty | loadNodes | loadRelationships |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "db" | "heavy" | "OUTGOING" | false      | false  | 11474730 | 9770       | false         | <null>     | <null>             | <null>       | ""        | ""                |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row
9824 ms
neo4j-sh (?)$ call algo.pageRank(null,null,{graph:'db',write:false});
+----------------------------------------------------------------------------------------------------------+
| nodes    | iterations | loadMillis | computeMillis | writeMillis | dampingFactor | write | writeProperty |
+----------------------------------------------------------------------------------------------------------+
| 11474730 | 20         | 0          | 14391         | -1          | 0.85          | false | <null>        |
+----------------------------------------------------------------------------------------------------------+
1 row
14527 ms
neo4j-sh (?)$ call algo.pageRank(null,null,{graph:'db',write:false});
+----------------------------------------------------------------------------------------------------------+
| nodes    | iterations | loadMillis | computeMillis | writeMillis | dampingFactor | write | writeProperty |
+----------------------------------------------------------------------------------------------------------+
| 11474730 | 20         | 0          | 13671         | -1          | 0.85          | false | <null>        |
+----------------------------------------------------------------------------------------------------------+
1 row
13719 ms
neo4j-sh (?)$ call algo.pageRank(null,null,{graph:'db',write:false});
+----------------------------------------------------------------------------------------------------------+
| nodes    | iterations | loadMillis | computeMillis | writeMillis | dampingFactor | write | writeProperty |
+----------------------------------------------------------------------------------------------------------+
| 11474730 | 20         | 1          | 13424         | -1          | 0.85          | false | <null>        |
+----------------------------------------------------------------------------------------------------------+
1 row
13478 ms

3.4.12.6

with int:

neo4j-sh (?)$ call algo.pageRank(null,null,{graph:'db',write:false});
+----------------------------------------------------------------------------------------------------------+
| nodes    | iterations | loadMillis | computeMillis | writeMillis | dampingFactor | write | writeProperty |
+----------------------------------------------------------------------------------------------------------+
| 11474730 | 20         | 0          | 18852         | -1          | 0.85          | false | <null>        |
+----------------------------------------------------------------------------------------------------------+
1 row
18991 ms
neo4j-sh (?)$ call algo.pageRank(null,null,{graph:'db',write:false});
+----------------------------------------------------------------------------------------------------------+
| nodes    | iterations | loadMillis | computeMillis | writeMillis | dampingFactor | write | writeProperty |
+----------------------------------------------------------------------------------------------------------+
| 11474730 | 20         | 0          | 16662         | -1          | 0.85          | false | <null>        |
+----------------------------------------------------------------------------------------------------------+
1 row
16690 ms
neo4j-sh (?)$ call algo.pageRank(null,null,{graph:'db',write:false});
+----------------------------------------------------------------------------------------------------------+
| nodes    | iterations | loadMillis | computeMillis | writeMillis | dampingFactor | write | writeProperty |
+----------------------------------------------------------------------------------------------------------+
| 11474730 | 20         | 1          | 15013         | -1          | 0.85          | false | <null>        |
+----------------------------------------------------------------------------------------------------------+
1 row
15059 ms

with float:

call algo.pageRank(null,null,{graph:'db',write:false});
+----------------------------------------------------------------------------------------------------------+
| nodes    | iterations | loadMillis | computeMillis | writeMillis | dampingFactor | write | writeProperty |
+----------------------------------------------------------------------------------------------------------+
| 11474730 | 20         | 0          | 19105         | -1          | 0.85          | false | <null>        |
+----------------------------------------------------------------------------------------------------------+
1 row
19239 ms
neo4j-sh (?)$ call algo.pageRank(null,null,{graph:'db',write:false});
+----------------------------------------------------------------------------------------------------------+
| nodes    | iterations | loadMillis | computeMillis | writeMillis | dampingFactor | write | writeProperty |
+----------------------------------------------------------------------------------------------------------+
| 11474730 | 20         | 0          | 19225         | -1          | 0.85          | false | <null>        |
+----------------------------------------------------------------------------------------------------------+
1 row
19254 ms
neo4j-sh (?)$ call algo.pageRank(null,null,{graph:'db',write:false});
+----------------------------------------------------------------------------------------------------------+
| nodes    | iterations | loadMillis | computeMillis | writeMillis | dampingFactor | write | writeProperty |
+----------------------------------------------------------------------------------------------------------+
| 11474730 | 20         | 1          | 17501         | -1          | 0.85          | false | <null>        |
+----------------------------------------------------------------------------------------------------------+
1 row
17546 ms
jexp commented 5 years ago

each of them is the third run.

float 17s vs 15s int

also it seems that PR got slower between 3.4.0.9 (13s) and 3.4.12.6 (15s)