neo4j / graph-data-science

Source code for the Neo4j Graph Data Science library of graph algorithms.
https://neo4j.com/docs/graph-data-science/current/
Other
621 stars 160 forks source link

RandomWalk hitting hard-coded 100 second timeout #260

Closed hindog closed 1 year ago

hindog commented 1 year ago

Is your feature request related to a problem? Please describe.

When calling the following on a fairly large graph (168973 nodes, 5675864 rels):

CALL gds.randomWalk.stream(
  'myGraph',
  {
    walkLength: 3,
    walksPerNode: 1,
    randomSeed: 42,
    concurrency: 1
  }
)
YIELD nodeIds, path
RETURN nodeIds, [node IN nodes(path) | node.label ] AS labels

.. the algorithm always returns an empty result after ~100000ms. It seemed like this might be due to some timeout that's happening. I went through the code and found this hard-coded timeout in RandomWalk

https://github.com/neo4j/graph-data-science/blob/bca2cbddf719b1d697bfac93d677b6bd35aee625/algo/src/main/java/org/neo4j/gds/traversal/RandomWalk.java#L188

NOTE: when calling word2vec, we actually get an error instead of empty result. The following call fails:

CALL gds.beta.node2vec.stream('myGraph', {embeddingDimension: 2})
YIELD nodeId, embedding
RETURN nodeId, embedding

// fails with:
// Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure gds.beta.node2vec.stream: Caused by: java.lang.IllegalArgumentException: Unknown subtask: create walks

Describe the solution you would like

A configurable timeout value or some way to work around the 100s timeout.

Describe alternatives you have considered

May need to fork+patch the library if timeout cannot be adjusted.