neo4j-contrib / neo4j-apoc-procedures

Awesome Procedures On Cypher for Neo4j - codenamed "apoc"                     If you like it, please ★ above ⇧            
https://neo4j.com/labs/apoc
Apache License 2.0
1.7k stars 495 forks source link

OutOfMemoryError for apoc.meta.nodeTypeProperties and apoc.meta.relTypeProperties #1754

Closed tbwiss closed 3 years ago

tbwiss commented 3 years ago

Guidelines

Please note that GitHub issues are only meant for bug reports/feature requests. If you have questions on how to use APOC, please ask on the Neo4j Discussion Forum instead of creating an issue here.

Expected Behavior (Mandatory)

nodeTypeProperties and relTypeProperties functions execute without an error.

Actual Behavior (Mandatory)

nodeTypeProperties and relTypeProperties functions err out in a OutOfMemoryError

How to Reproduce the Problem

Simple Dataset (where it's possibile)

Enron database (Neo4j internal DB) 1M nodes (5 labels) 13.5M rels (5 types)

Steps (Mandatory)

  1. Run the DB on Neo4j Desktop (with APOC plugin installed)
  2. Open Neo4j Browser
  3. Execute CALL apoc.meta.relTypeProperties or CALL apoc.meta.nodeTypeProperties

Screenshots (where it's possibile)

Full error logs: Failed to invoke procedure `apoc.meta.nodeTypeProperties`: Caused by: java.lang.OutOfMemoryError: Java heap space Failed to invoke procedure `apoc.meta.relTypeProperties`: Caused by: java.lang.OutOfMemoryError: Java heap space

Specifications (Mandatory)

Currently used versions Neo4j Desktop: 1.3.11

Versions

moxious commented 3 years ago

I believe user needs to increase heap, or otherwise has a lot of heap contention at the time. This procedure doesn't use much memory, but the outcome reported is clearly possible if the machine is under a lot of memory pressure at the time, or if sampling options are used that are really aggressive.

Can you provide:

conker84 commented 3 years ago

@tbwiss can you please answer to @moxious questions?

tbwiss commented 3 years ago

Thanks for replying @moxious and @conker84 I was running the procedure locally on my Mac (32 GB RAM, the max heap size is 8192 MB, the initial heap size is 512 MB) and the following applications where running while I ran the commands in Neo4j Browser: VSCode, Chrome, Neo4j Desktop&Browser, Finder, Notes, Terminal. I haven't provided any options and I called the procedure as I outlined it above in step 3. There are no array typed properties in the db, apart from one DateTime there are only String properties.

moxious commented 3 years ago

@tbwiss I think we might need some more hints to find what's going on here. Allow me to explain how this works and why this error should be improbable, but please mention more about the scenario you're seeing so we can hopefully find it. (Does this happen repeatedly, or only sometimes?)

The way the procedure works is by looking through all of the nodes/rels in your database and picking a random sample of them. For example, for every node we might look at 1 out of every 1,000 instances of that node and check the properties. We check to see if there are constraints defined, and what the types of the data on the properties are.

The error you're seeing says something somewhere has used too much memory but this doesn't make sense, because the "profile" we keep in memory is really tiny (it's here: https://github.com/neo4j-contrib/neo4j-apoc-procedures/blob/4.1/core/src/main/java/apoc/meta/Tables4LabelsProfile.java).

So basically, to blow up memory, the profile has to be really massive (somehow, and this would depend on the data in your database) or your database has to be so short on free heap space that even a small extra amount is too much to allocate.

How could the profile be massive?

Do you have thousands of distinct labels or relationship types? Do you have 10s of thousands or 100s of thousands of distinct property names in the database?

How could you otherwise be short on heap?

By running lots of other big transactions at the same time.

Possible Clues:

(This is a bad idea because of how heap auto-allocation works in neo4j. We recommend setting the initial and max heap to be the same)

Questions:

tbwiss commented 3 years ago

@moxious It does happen repeatedly, or in other words an error rate of a 100%. Thanks for the detailed explanation of the procedure!

I'm also not running any other transactions at the same time when I execute the procedure. Further, I ran this procedure with a bunch of different databases ranging from 1k nodes all the way up to 50M nodes and I only get the OutOfMemoryError in this one database (Enron).

I'll share the link to the database dump with you internally via email.

moxious commented 3 years ago

Update: not able to run this bug to ground. To be continued on an internal trello card here: https://trello.com/c/DkFiQpBz/6474-oom-on-stored-procedure-linked-to-excessive-memory-use-in-an-internal-cursor

moxious commented 3 years ago

Yikes. @tbwiss we investigated this with the internal kernel team and this is the answer that came back:

I run consistency check on the db and it reports 10100 inconsistent records. Those records cause cursor to go into infinite loop and produce objects until it OOM

So I'm very sorry to say that this seems to be caused by a database-specific problem due to inconsistent data, interacting with (arguably a bug, arguably not a bug because the data is bad) situation in the database kernel. The only solution I can offer is "Don't use that database" because it isn't fixable from the APOC perspective. Sorry.

tbwiss commented 3 years ago

@moxious thanks for investigating this! We'll have to sync team internal and see that we limit the use of the Enron database.