neo4j / graph-data-science

Source code for the Neo4j Graph Data Science library of graph algorithms.
https://neo4j.com/docs/graph-data-science/current/
Other
621 stars 160 forks source link

Thrown 'OutOfMemoryError: unable to create native thread:' when apply many times of GDS Algorithm #132

Closed littlemilkwu closed 2 years ago

littlemilkwu commented 3 years ago

Describe the bug when run many times of algorithm like pagerank, the java process thread in Activity Monitor will grow too large, and cause an error like

Caused by: java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached

Also, the thread cost by java process will not release automatically util I restart whole Neo4j Desktop.

To Reproduce GDS version: 1.7.0 Neo4j version: 4.3.3 Operating system: macOS Big Sur 11.5.2

Steps to reproduce the behavior:

  1. Java threads was around 100 at the beginning. 截圖 2021-09-26 下午5 47 39

follow the pagerank example in gds 1.7 docs https://neo4j.com/docs/graph-data-science/1.7/algorithms/page-rank/ submit all cypher codes in Neo4j Browser

  1. create nodes, relations

    CREATE
    (home:Page {name:'Home'}),
    (about:Page {name:'About'}),
    (product:Page {name:'Product'}),
    (links:Page {name:'Links'}),
    (a:Page {name:'Site A'}),
    (b:Page {name:'Site B'}),
    (c:Page {name:'Site C'}),
    (d:Page {name:'Site D'}),
    
    (home)-[:LINKS {weight: 0.2}]->(about),
    (home)-[:LINKS {weight: 0.2}]->(links),
    (home)-[:LINKS {weight: 0.6}]->(product),
    (about)-[:LINKS {weight: 1.0}]->(home),
    (product)-[:LINKS {weight: 1.0}]->(home),
    (a)-[:LINKS {weight: 1.0}]->(home),
    (b)-[:LINKS {weight: 1.0}]->(home),
    (c)-[:LINKS {weight: 1.0}]->(home),
    (d)-[:LINKS {weight: 1.0}]->(home),
    (links)-[:LINKS {weight: 0.8}]->(home),
    (links)-[:LINKS {weight: 0.05}]->(a),
    (links)-[:LINKS {weight: 0.05}]->(b),
    (links)-[:LINKS {weight: 0.05}]->(c),
    (links)-[:LINKS {weight: 0.05}]->(d);
  2. project gds graph

    CALL gds.graph.create(
    'myGraph',
    'Page',
    'LINKS',
    {
    relationshipProperties: 'weight'
    }
    )
  3. run Page Rank algorithm many times (using unwind just for loop)

    UNWIND range(0, 8000) as no_need
    CALL gds.pageRank.stream('myGraph')
    YIELD nodeId, score
    RETURN gds.util.asNode(nodeId).name AS name, score
    ORDER BY score DESC, name ASC
  4. OOM error image

  5. threads meet 409X cause this error image

Expected behavior I thought Java Threads will automatically release after few minute.

Additional context At first, I was written python script using Neo4J Python Driver 4.3 to run many times personalized PageRank. In my datasets, there are about 5 thousand nodes I need to apply pagerank, then I faced this problem, I thought Maybe I wasn't close sessions properly, but I open and close sesison on every time pagerank apply, this problem still there, And even using Neo4J Browser would faced this problem too.

In neo4j.conf I just edit these setting

dbms.memory.heap.initial_size=2G
dbms.memory.heap.max_size=8G
dbms.memory.pagecache.size=4G
s1ck commented 2 years ago

@littlemilkwu Thanks for bringing this issue to our attention. And sorry for the late reply.

Two comments:

The problem you're facing is certainly interesting, but unfortunately won't be addressed with highest priority since it is not an officially supported usage pattern. That being said, if you want to help debugging this (which we would appreciate), I'd recommend starting with monitoring the JVM while the loop is running and identify code that constantly allocates. You could use Flight Recorder or Java Visual VM. And insights you might find can help us identifying and eventually fixing a potential leak.

Thanks.

s1ck commented 2 years ago

Closing this for now. Feel free to re-open if you want to continue the discussion.

littlemilkwu commented 2 years ago

In my real use case, I first create an In-memory projected graph which have 6017 nodes and 37819 relations. Then, I run personal PageRank only 3 times that each time with 2276 nodes as [sourceNodes].

what can I do for this situation ? I am suffer from this error for a long time....

below are my cypher queries

CALL gds.graph.create(
    'UP',
    ['User', 'Post'],
    {
        Action: { 
            orientation: 'UNDIRECTED',
            properties: 'weight'
        }
    }
)
MATCH (sources:User)
WITH sources
CALL gds.pageRank.stream('UP', {
    sourceNodes: [sources],
    relationshipWeightProperty: 'weight',
    concurrency: 1
})
YIELD nodeId, score
WITH sources, gds.util.asNode(nodeId) as nodes, score
WHERE (nodes.pid IS NOT NULL) AND (NOT (sources)-[]->(nodes))
RETURN sources.uid as source, nodes.pid as pid, score
ORDER BY score DESC

It still throw the same error. (neo4j.log) image

(query.log) image

(debug.log) image

132