Closed tomasonjo closed 1 year ago
Thanks for reporting the bug! Its been a while since we saw a :bat: bug in our code. With your very helpful explanations, I am sure we will get to the bottom of it :)
This is now fixed in the patch release 2.3.2. Do let us know if any issue with graphSage still exists.
GDS version: 2.3.1 Neo4j version: 5.5.0 Operating system: Ubuntu 20.04
I have a case where GraphSAGE returns NaN values if I use relationship weights. If I don't use relationship weights, the problem does not occur. I tried to reproduce on a smaller dataset, but I couldn't manage to do it.
The dump is available at: https://drive.google.com/file/d/1InMCuyjJaj2RGJ6eelx0Q2oPSddfafvZ/view?usp=share_link It's basically 50k medium posts with openAI embeddings (2GB unzipped). I added all the steps that I do (some might be redundant) to reproduce the error.
Project the graph:
Infer monopartite network:
WCC algo
CALL gds.wcc.write('articles', {writeProperty:'wcc', nodeLabels:['Article'], relationshipTypes:['SIMILAR']})
Use results of WCC algo to find the best start node for graph sampling
Train the model
Stream the embeddings
Returns
The problem goes away if I don't include
relationshipWeightProperty
during training.