neo4j-contrib / neo4j-graph-algorithms

Efficient Graph Algorithms for Neo4j
https://github.com/neo4j/graph-data-science/
GNU General Public License v3.0
767 stars 196 forks source link

UnionFind doesn't merge the connected components #920

Closed kowaalczyk closed 4 years ago

kowaalczyk commented 4 years ago

I tried to use the unionFind algorithm, but I was unable to get the correct results: it assigned every node to its own component, without merging the connected nodes.

After more investigation I couldn't even reproduce the example from official documentation:

Connected to Neo4j 3.5.13 at bolt://localhost:7687.
Type :help for a list of available commands or :exit to exit the shell.
Note that Cypher queries must end with a semicolon.

neo4j> CREATE (nAlice:User {name: 'Alice'});
0 rows available after 108 ms, consumed after another 0 ms
Added 1 nodes, Set 1 properties, Added 1 labels

neo4j> CREATE (nBridget:User {name: 'Bridget'});
0 rows available after 4 ms, consumed after another 0 ms
Added 1 nodes, Set 1 properties, Added 1 labels

neo4j> CREATE (nCharles:User {name: 'Charles'});
0 rows available after 5 ms, consumed after another 0 ms
Added 1 nodes, Set 1 properties, Added 1 labels

neo4j> CREATE (nDoug:User {name: 'Doug'});
0 rows available after 6 ms, consumed after another 0 ms
Added 1 nodes, Set 1 properties, Added 1 labels

neo4j> CREATE (nMark:User {name: 'Mark'});
0 rows available after 3 ms, consumed after another 0 ms
Added 1 nodes, Set 1 properties, Added 1 labels

neo4j> CREATE (nMichael:User {name: 'Michael'});
0 rows available after 3 ms, consumed after another 0 ms
Added 1 nodes, Set 1 properties, Added 1 labels

neo4j> CREATE (nAlice)-[:LINK {weight: 0.5}]->(nBridget);
0 rows available after 53 ms, consumed after another 0 ms
Added 2 nodes, Created 1 relationships, Set 1 properties

neo4j> CREATE (nAlice)-[:LINK {weight: 4}]->(nCharles);
0 rows available after 4 ms, consumed after another 0 ms
Added 2 nodes, Created 1 relationships, Set 1 properties

neo4j> CREATE (nMark)-[:LINK {weight: 1.1}]->(nDoug);
0 rows available after 4 ms, consumed after another 0 ms
Added 2 nodes, Created 1 relationships, Set 1 properties

neo4j> CREATE (nMark)-[:LINK {weight: 2}]->(nMichael);
0 rows available after 4 ms, consumed after another 0 ms
Added 2 nodes, Created 1 relationships, Set 1 properties

neo4j> call algo.unionFind('User', 'LINK')
       yield setCount;
1 row available after 1372 ms, consumed after another 0 ms
+----------+
| setCount |
+----------+
| 6        |
+----------+

neo4j> match (u:User) return *;
+-----------------------------------------+
| u                                       |
+-----------------------------------------+
| (:User {name: "Alice", partition: 0})   |
| (:User {name: "Bridget", partition: 1}) |
| (:User {name: "Charles", partition: 2}) |
| (:User {name: "Doug", partition: 3})    |
| (:User {name: "Mark", partition: 4})    |
| (:User {name: "Michael", partition: 5}) |
+-----------------------------------------+

6 rows available after 24 ms, consumed after another 1 ms

The expected result would be 2 connected components (partitions), but as you can see I got 6.

My setup is:

I'm not sure what the issue is, there is a chance I may have done something wrong - I started using neo4j only yesterday so I'm not really familiar with the project, and I'll greatly appreciate some help :)

AliciaFrame commented 4 years ago

You're getting a different result because of your CREATE statements for the relationships - it looks like, instead of running the query as a block of code, you're running each line individually. When you use CREATE (nAlice:User {name: 'Alice'}) you temporarily create a variable (nAlice) that refers to the node with the name of Alice, however, the variable doesn't persist between queries -- so it's not available when you run CREATE (nAlice)-[:LINK {weight: 0.5}]->(nBridget).

So if you run:

CREATE (nAlice:User {name: 'Alice'})
CREATE (nBridget:User {name: 'Bridget'})
CREATE (nCharles:User {name: 'Charles'})
CREATE (nDoug:User {name: 'Doug'})
CREATE (nMark:User {name: 'Mark'})
CREATE (nMichael:User {name: 'Michael'})

CREATE (nAlice)-[:LINK {weight: 0.5}]->(nBridget)
CREATE (nAlice)-[:LINK {weight: 4}]->(nCharles)
CREATE (nMark)-[:LINK {weight: 1.1}]->(nDoug)
CREATE (nMark)-[:LINK {weight: 2}]->(nMichael);

As a single Cypher statement, then call the algo, you should get the results you expect :)