neo4j-contrib / neo4j-graph-algorithms

Efficient Graph Algorithms for Neo4j
https://github.com/neo4j/graph-data-science/
GNU General Public License v3.0
772 stars 194 forks source link

Betweenness concurrency issue #445

Closed tomasonjo closed 7 years ago

tomasonjo commented 7 years ago

I tried to reproduce this issue on a smaller graph, but I didn't find any small example, that this issue occurs. I was first using cypher loading to load only the biggest component, but tested label loading and the error persists.

Crossreferenced it with gephi. Running algo with concurrency : 1 delivers same 20 heroes with highest betweenness as gephi, but with some different betweenness weights. Gephi results for reference:

gephi-crossreference

Our closeness centrality suffers greatly from disconnected components. If I run the algo only on the largest wCC then the results exactly match. If I will create a separate issue.

Example data:

best to use constraint:

CALL apoc.schema.assert(
{},
{Comic:['name'],Hero:['name']});

data:

USING PERIODIC COMMIT 5000
LOAD CSV WITH HEADERS FROM 
"https://raw.githubusercontent.com/tomasonjo/neo4j-marvel/master/data/edges.csv" as row
MERGE (h:Hero{name:row.hero})
MERGE (c:Comic{name:row.comic})
MERGE (h)-[:APPEARED_IN]->(c)

create network:

CALL apoc.periodic.iterate(
"MATCH (p1:Hero)-->(:Comic)<--(p2:Hero) where id(p1) < id(p2) RETURN p1,p2",
"MERGE (p1)-[r:KNOWS]-(p2)"
, {batchSize:5000, parallel:false,iterateList:true})

If i run betwenness with concurrency higher then 1:

CALL algo.betweenness.stream('Hero', 'KNOWS',
{concurrency:8}) YIELD nodeId, centrality
RETURN nodeId,centrality 
ORDER BY centrality DESC LIMIT 20

Results:

conc8

Running betweenness with concurrency 1:

CALL algo.betweenness.stream('Hero', 'KNOWS',
{concurrency:1}) YIELD nodeId, centrality
RETURN nodeId,centrality 
ORDER BY centrality DESC LIMIT 20

Results:

conc1

mknblch commented 7 years ago

can you please try a different scale_factor in the configuration. default is 100_000 which might be to high; lets try 100 or so: {scaleFactor:100}

tomasonjo commented 7 years ago

Setting scaleFactor to 100 solved the problem.

I will close the issue as you have already created https://github.com/neo4j-contrib/neo4j-graph-algorithms/issues/455