neo4j-contrib / neo4j-graph-algorithms

Efficient Graph Algorithms for Neo4j
https://github.com/neo4j/graph-data-science/
GNU General Public License v3.0
767 stars 195 forks source link

min/max/sum-centrality is not reported correctly for betweeness #688

Open jexp opened 5 years ago

jexp commented 5 years ago

It should not be -1 on all accounts.

CALL algo.betweenness('Character','RELATED_TO', {direction:'out',write:true, writeProperty:'betweenness'})
       YIELD nodes, minCentrality, maxCentrality, sumCentrality, loadMillis, computeMillis, writeMillis;
+--------------------------------------------------------------------------------------------------+
| nodes | minCentrality | maxCentrality | sumCentrality | loadMillis | computeMillis | writeMillis |
+--------------------------------------------------------------------------------------------------+
| 43    | -1.0          | -1.0          | -1.0          | 16         | 36            | 11          |
+--------------------------------------------------------------------------------------------------+

match (n:Character) return n.name, n.betweenness;
+------------------------------------+
| n.name       | n.betweenness       |
+------------------------------------+
| "Aragorn"    | 0.0                 |
| "Arathorn"   | 0.0                 |
| "Arwen"      | 5.05                |
| "Balin"      | 0.125               |
| "Beregond"   | 0.0                 |
| "Bilbo"      | 6.4258658008658     |
| "Bill"       | 0.0                 |
| "Boromir"    | 23.949476911976905  |
| "Celeborn"   | 1.2011904761904761  |
| "Denethor"   | 3.2444555444555445  |
| "Durin"      | 0.9420995670995671  |
| "Elendil"    | 10.238701576201578  |
| "Elrond"     | 20.924873737373733  |
| "Éomer"      | 9.837312687312687   |
| "Eorl"       | 3.0356893106893104  |
| "Éowyn"      | 5.4                 |
| "Faramir"    | 15.694617882117882  |
| "Frodo"      | 67.11694832944833   |
| "Galadriel"  | 9.221302308802308   |
| "Gandalf"    | 41.49869436119435   |

match (n:Character) with n.betweenness as b return min(b), max(b), sum(b);
+-------------------------------------------------+
| min(b) | max(b)            | sum(b)             |
+-------------------------------------------------+
| 0.0    | 67.11694832944833 | 379.00000000000006 |
+-------------------------------------------------+
mknblch commented 5 years ago

Hi min, max and sum is more like an old relict. Those values are helpfull but they need a full iteration after the algo terminates. Having more statistics or a histogram (as you mentioned in #652 ) would make more sense in my opinion. But then we should focus on implementing a robust toolkit for this task which can be used in other algos as well. Im more concered about Aragorn and the other 0.0's. Are those nodes disconnected?

jexp commented 5 years ago

we started to add min, max, mean, stddev, p25, p50,p75,p90,p95,p99,p999,p100 for Jaccard and I think we should add it to all the other non-streaming algorithms too.

Agree on the toolkit. We used (Atomic)HdrHistogram to gather those stats.

miromarchi commented 5 years ago

Just a comment: I think min and max are useful to normalize the centrality, or that could come as an optional param too.