neo4j / graph-data-science

Source code for the Neo4j Graph Data Science library of graph algorithms.
https://neo4j.com/docs/graph-data-science/current/
Other
596 stars 157 forks source link

I can find the connected nodes, but the similarity calculated by gds.nodeSimilarity.stream() is 0 #311

Closed aruig666 closed 3 weeks ago

aruig666 commented 1 month ago

I search for nodes and their adjacent nodes by name, as shown in the code and figure: MATCH (n:Operation{name:'F1_ROUGHTOPFACE_P1'})-[:Member|:Use]-(a) MATCH (m:Operation{name:'F2_SPOTDRILL_P1'})-[:Member|:Use]-(b)RETURN n,m,a,b Snipaste_2024-05-30_11-57-35

But when I use gds.nodeSimilarity.stream() to calculate the similarity, the code and the result are as shown: CALL gds.nodeSimilarity.stream('myGraph') YIELD node1, node2, similarity where gds.util.asNode(node1).name='F1_ROUGHTOPFACE_P1' and and gds.util.asNode(node2).name='F2_SPOTDRILL_P1' RETURN gds.util.asNode(node1).name AS Process1, gds.util.asNode(node2).name AS Process2, similarity ORDER BY similarity Snipaste_2024-05-30_12-00-59

My operating environment is: neo4j-community-5.20.0 Neo4j Graph Data Science Library v2.6.7 Windows 11 Family Chinese Edition v23H2

IoannisPanagiotas commented 1 month ago

Hi @aruig666 ,

I have transferred your question to the appropriate repository. Could you please let me know what is the projection query that you have used to create myGraph ?

Best regards, Ioannis.

aruig666 commented 3 weeks ago

neo4jtest.zip

I uploaded my code and data. You can run the 'processneomodel2.py' file to generate my graph with two json files. The query code is described above. Please help me look at the problem I encountered.

IoannisPanagiotas commented 3 weeks ago

Hi again @aruig666,

Thank you for sharing your data. I have executed the node similarity by projecting everything: gds.graph.project('mygraph','*','*')

When I executed your query I also did not get any results back. However, by slightly tweaking as follows I could obtain some results back:

CALL gds.nodeSimilarity.stream('mygraph', { topk: 20}) YIELD node1, node2, similarity where gds.util.asNode(node1).name='F1_ROUGHTOPFACE_P1' and gds.util.asNode(node2).name='F2_SPOTDRILL_P1' RETURN gds.util.asNode(node1).name AS Process1, gds.util.asNode(node2).name AS Process2, similarity ORDER BY similarity

"F1_ROUGHTOPFACE_P1" | "F2_SPOTDRILL_P1" | 0.2
"F1_ROUGHTOPFACE_P1" | "F2_SPOTDRILL_P1" | 0.2

What happens is that the nodesimilarity algorithm does not return all results, but rather the top ones (i.e., either the topN global largest similariities or the topK closest similarities per node).

In your case, these two similarities are 0.2 and are smaller than others so they gets discarded. To go around this, we must change the configuration similar to how I did. I invite you to have a. look at the relevant section in the documentation.

Let me know if that solves your problem.

Best regards, Ioannis.

aruig666 commented 3 weeks ago

I set topk=20 as you said, and the search results are normal now. Thank you very much

IoannisPanagiotas commented 3 weeks ago

Good to hear!

I will close the issue now, please let us know if you need any more help.

Ioannis.