neo4j / graph-data-science

Source code for the Neo4j Graph Data Science library of graph algorithms.
https://neo4j.com/docs/graph-data-science/current/
Other
621 stars 160 forks source link

CELF may sometimes generate negative spread for nodes #255

Closed johnlinp closed 1 year ago

johnlinp commented 1 year ago

Describe the bug The function of CELF algorithm may sometimes generate negative spread for nodes.

To Reproduce

  1. Import the graph data
  2. Project the graph
  3. Call CELF to calculate the spread for the seed set nodes
    CALL gds.beta.influenceMaximization.celf.stream('myGraph', {seedSetSize: 10});
  4. The result shows:
    +-----------------------------+
    | nodeId | spread             |
    +-----------------------------+
    | 78875  | 50.67000000000007  |
    | 158704 | 29.639999999999986 |
    | 65595  | -228.68            |
    | 31602  | 127.13             |
    | 101655 | 54.84000000000003  |
    | 37442  | 42.039999999999964 |
    | 25207  | 29.41999999999996  |
    | 54514  | 429.31             |
    | 4846   | 153.73000000000002 |
    | 152001 | 59.819999999999936 |
    +-----------------------------+

As you can see, the node 65595 has the spread of -228.68, which is negative.

I'm sorry, but I couldn't share my graph data here due to privacy issue. But the number of the nodes is about 50k and the number of the relationships is about 180k.

GDS version: 2.2.0 Neo4j version: 4.4.10 Operating system: Debian

Expected behavior According to the definition of the "spread" in CELF, it is the number of nodes that become influenced by a given node. Therefore, I assume that the spread should always be a positive number or zero. It doesn't make sense to me if the spread is negative.

vnickolov commented 1 year ago

@johnlinp thank you for reporting this, we are going to look into it and will get back to you, can you share the output of

CALL gds.graph.list('myGraph')
YIELD graphName, degreeDistribution;

This will help us to create a synthetic dataset close to the one you have.

Thank you again. V.

vnickolov commented 1 year ago

@johnlinp we have identified the issue and we are working on a fix, it should be included in the next release, we will let you know when it happens.

johnlinp commented 1 year ago

Thank you so much!

vnickolov commented 1 year ago

@johnlinp sorry for the delay, the fix should now be available in GDS 2.3.2 version please give it a try and let us know if we can close this issue.