neo4j-contrib / neo4j-graph-algorithms

Efficient Graph Algorithms for Neo4j
https://github.com/neo4j/graph-data-science/
GNU General Public License v3.0
772 stars 194 forks source link

Balanced triads on a large graph #732

Open tomasonjo opened 6 years ago

tomasonjo commented 6 years ago

I ran balanced triads on snap signed network:

https://snap.stanford.edu/data/soc-sign-epinions.html

I imported the graph as undirected even though original is directed:

USING PERIODIC COMMIT 10000
LOAD CSV FROM "file:///soc-sign-epinions.txt" as row FIELDTERMINATOR " "
WITH row SKIP 4
MERGE (m1:Member{id:row[0]})
MERGE (m2:Member{id:row[1]})
MERGE (m1)-[t:TRUST]-(m2)
ON CREATE SET t.weight = toINT(row[2])

Run algorithm:

CALL algo.balancedTriads('Member', 'TRUST', {weightProperty:'weight'}) 
YIELD loadMillis, computeMillis, writeMillis, nodeCount, balancedTriadCount, unbalancedTriadCount;

Algorithms start loading the graph as logs indicate and after 40+ mins still nothing new in the logs.

2018-10-06 20:07:38.433+0000 INFO [node-importer] LOADING 1%

mknblch commented 6 years ago

Looks like the algo did not even start to run, it got stuck during the loading step. BD forces the loader to load a HugeGraph, maybe there is a problem. Im going to create a small test for this. thx

tomasonjo commented 5 years ago

Found some weird behaviour here...

Using Neo4j 3.5.2 and graph algos 3.5.3.3 in Neo4j Browser.

On the first run of balanced triads the above bug is relevant as the loader stays at 1%. I got it to complete the algo a couple of times on the second run. And if I restarted the db the same scenario occurred where balanced triads worked only on the second run.

After a couple of iterations this stopped working, which is really confusing and I can't get it to work anymore.

Recreate on a smaller graph:

UNWIND range(1,100) as x
CREATE (:Node{id:x});

MATCH (n1:Node),(n2:Node)
WHERE n1.id > n2.id*3
MERGE (n1)-[:LINK{weight:1.0}]-(n2);

and then run:

CALL algo.balancedTriads('Node', 'LINK', {weightProperty:'weight'}) 
YIELD loadMillis, computeMillis, writeMillis, nodeCount, balancedTriadCount, unbalancedTriadCount;

As there seems to be a problem with the graph loading I tried to just load the graph too:```

CALL algo.graph.load('my-graph','Node','LINK',{graph:'huge',weightProperty:'weight'})
  YIELD name, graph, direction, undirected, sorted, nodes, loadMillis, alreadyLoaded,
        nodeWeight, relationshipWeight, nodeProperty, loadNodes, loadRelationships;

and the bug persists.