neo4j / graph-data-science

Source code for the Neo4j Graph Data Science library of graph algorithms.
https://neo4j.com/docs/graph-data-science/current/
Other
597 stars 157 forks source link

Scaling properties with default values #208

Closed tomasonjo closed 1 year ago

tomasonjo commented 2 years ago

GDS version: 2.1.4 Neo4j version: 4.4.8 Operating system: Ubuntu 20.04

When you load a value with default value, which is set to negative really small, the standard scaler with return NaN values:

Create graph:

CREATE (n1:Node), (n2:Node), (n3:Node)
CREATE (n1)-[:REL]->(n2);

Create some property for not all nodes

MATCH (n:Node)-[:REL]-()
WITH n, count(*) as degree
SET n.degree = degree

Project the graph:

CALL gds.graph.project('test', 'Node', 'REL', {nodeProperties:'degree'})

Scale the property:

CALL gds.alpha.scaleProperties.stream('test', {nodeProperties:['degree'], scaler:'STDSCORE'})

Results:

nodeId scaledProperty
0 [NaN]
1 [NaN]
2 [NaN]

This error happened to me on a much larger graph where only a couple of nodes had the default value. Since the algo doesn't mention any problems, you only find out about it in a downstream task. Other scalers return weird results as well due to default value being such a big negative number. IDK what's the solution, but the user should know that the results contain NaN values

s1ck commented 1 year ago

If the node property you want to project needs a default value other than the one we define, you need to specify that in the node projection, like so:

CALL gds.graph.project('test', {Node: { properties: { degree : { defaultValue: 0.0 }}}}, 'REL')