Closed joyemang33 closed 10 months ago
Hello again, thanks for submitting another very well-written bug report!
On a first glance, the different result on multiple runs, hints towards a concurrency bug.
So you could try to fix concurrency=1
as a temporary workaround.
Many thanks for your so quick response again! I also agree with you that the root cause may be in the implementation of the concurrency. When I set concurrency = 1
, I obtain a new and stable result 3 different from the previous two results.
Results3
nodeA nodeB nodeC
0 0 1 2
1 0 2 7
2 0 2 9
3 0 4 7
4 0 4 8
5 0 4 9
6 0 7 8
7 1 2 3
8 2 3 7
9 2 3 9
10 4 7 8
11 5 7 8
It doesn't contain the non-existing triangle (5, 6, 7) but has 11 triangles, which indicates that Result1 and Result2 may both be incorrect.
Many thanks for your so quick response again! I also agree with you that the root cause may be in the implementation of the concurrency. When I set
concurrency = 1
, I obtain a new and stable result 3 different from the previous two results.Results3
nodeA nodeB nodeC 0 0 1 2 1 0 2 7 2 0 2 9 3 0 4 7 4 0 4 8 5 0 4 9 6 0 7 8 7 1 2 3 8 2 3 7 9 2 3 9 10 4 7 8 11 5 7 8
It doesn't contain the non-existing triangle (5, 6, 7) but has 11 triangles, which indicates that Result1 and Result2 may both be incorrect.
PS: When I set concurrency = 1
and run my testing tools again, I found another bug-triggering test case for gds.alpha.triangles
. I believe that it is different from the potential concurrency bug in this issue, and I will report it once this issue has been confirmed.
Hi @FlorentinD! Here is another issue even though I set concurrency = 1
.
Run the following Python code:
import pandas as pd
if __name__ == "__main__":
nodes = pd.read_csv('G2_node.csv')
edges = pd.read_csv('G2_edge.csv')
G = gds.graph.construct(
graph_name="test",
nodes=nodes,
relationships=edges,
undirected_relationship_types= ["T0", "T1", "T2", "T3", "T4"]
)
print(gds.alpha.triangles(G, concurrency=1))
Result:
nodeA nodeB nodeC
0 0 1 8
1 0 1 9
2 0 3 8
3 0 3 9
4 1 4 7
5 2 6 8
6 3 4 7
However, the result contains a non-existing triangle (0, 3, 9) since the graph does not contain the edge between node 3 and node 9. I believe that it may indicate a unique bug in this algorithm since the results of one concurrency is invariant. Could you further investigate and confirm it?
The graph data is here: G2_edge.csv G2_node.csv
Neo4j Version: 5.10 Operating System: Ubuntu 20.04 API: Docker
Hello! When I use Graph Data Science to test the algorithm
gds.alpha.triangles
, I found that it returns unstable results for the same graph.Steps to reproduce
Run the following Python code:
Interestingly, running the code multiple times can yield two inconsistent results:
Result1:
Result2:
After my investigation, the result1 is obviously wrong since there is no edge between the node 6 and node 7 in the data but (5, 6, 7) is listed in line 8 of the result. I am not sure what the root cause of the strange behavior is, but it may indicate a potential bug in Neo4j's triangle algorithm. Could you further investigate and confirm it? It would be highly appreciated!
Here is the graph data: G_edge.csv G_node.csv
Best regards, Joye