neo4j / graph-data-science

Source code for the Neo4j Graph Data Science library of graph algorithms.
https://neo4j.com/docs/graph-data-science/current/
Other
596 stars 157 forks source link

GraphSAGE returns NaN when using relationship weights #250

Closed tomasonjo closed 1 year ago

tomasonjo commented 1 year ago

GDS version: 2.3.1 Neo4j version: 5.5.0 Operating system: Ubuntu 20.04

I have a case where GraphSAGE returns NaN values if I use relationship weights. If I don't use relationship weights, the problem does not occur. I tried to reproduce on a smaller dataset, but I couldn't manage to do it.

The dump is available at: https://drive.google.com/file/d/1InMCuyjJaj2RGJ6eelx0Q2oPSddfafvZ/view?usp=share_link It's basically 50k medium posts with openAI embeddings (2GB unzipped). I added all the steps that I do (some might be redundant) to reproduce the error.

Project the graph:

CALL gds.graph.project(
    "articles",
    ["Article", "List"],
    'IN_LIST',
    {nodeProperties:['openaiEmbedding']}
)

Infer monopartite network:

CALL gds.nodeSimilarity.mutate('articles', 
  {topK:2000, mutateProperty:'score', mutateRelationshipType:'SIMILAR'})

WCC algo

CALL gds.wcc.write('articles', {writeProperty:'wcc', nodeLabels:['Article'], relationshipTypes:['SIMILAR']})

Use results of WCC algo to find the best start node for graph sampling

MATCH (a:Article)
WITH a.wcc AS wcc, count(*) AS count, collect(a)[0] AS node
ORDER BY count DESC LIMIT 1
CALL gds.alpha.graph.sample.rwr('trainGraph', 'articles', 
  {samplingRatio:0.20, startNodes:[id(node)], nodeLabels:['Article'], relationshipTypes:['SIMILAR']})
YIELD graphName
RETURN graphName

Train the model

CALL gds.beta.graphSage.train('trainGraph', 
  {embeddingDimension:64, relationshipWeightProperty:'score', 
  featureProperties:['openaiEmbedding'], modelName:'testModel'})

Stream the embeddings

CALL gds.beta.graphSage.stream('articles', 
  {modelName:'testModel', nodeLabels:['Article'], relationshipTypes:['SIMILAR']})
YIELD nodeId, embedding
RETURN * LIMIT 5

Returns

Screenshot from 2023-02-24 10-36-14

The problem goes away if I don't include relationshipWeightProperty during training.

FlorentinD commented 1 year ago

Thanks for reporting the bug! Its been a while since we saw a :bat: bug in our code. With your very helpful explanations, I am sure we will get to the bottom of it :)

brs96 commented 1 year ago

This is now fixed in the patch release 2.3.2. Do let us know if any issue with graphSage still exists.