neo4j / apoc

Apache License 2.0
96 stars 28 forks source link

No results being returned by `apoc.export.json.all` when `writeNodeProperties: true` for a large dataset when writing to a stream #647

Open elan-sfrancies opened 3 months ago

elan-sfrancies commented 3 months ago

I am experiencing issues when running the following query against an on-premise docker instance of neo4j Community Edition.

CALL apoc.export.json.all(null, {stream:true, jsonFormat: "JSON_LINES", writeNodeProperties: true})
YIELD file, nodes, relationships, properties, data
RETURN file, nodes, relationships, properties, data

Expected Behavior

The query returns results or displays an error showing why results could not be returned.

Actual Behavior

No results are returned (the following message is seen in the web interface):

(no changes, no records)

How to Reproduce the Problem

Steps

  1. Generate a neo4j instance with ~100,000 nodes, ~100,000 relationships and ~17,000,000 properties
  2. Run the following query:
    CALL apoc.export.json.all(null, {stream:true, jsonFormat: "JSON_LINES", writeNodeProperties: true})
    YIELD file, nodes, relationships, properties, data
    RETURN file, nodes, relationships, properties, data
  3. Observe that results are not returned I have tested this behaviour using the .Net Driver as well as the browser interface.

Screenshots

The results when writeNodeProperties: false:

WriteNodePropertiesFalse

The lack of results when writeNodeProperties: true:

WriteNodePropertiesTrue

Specifications

Memory: 30GB (it appears that the memory use increases during the query before topping out at around 9.5GB and then leveling off.) CPU: 20

Versions

gem-neo4j commented 3 months ago

Hey! Thanks for writing in, I suspect this is an OOM as APOC does not implement memory tracking. Are you able to check out the debug.log file and see if there are errors there? If so, can you send that here too?

Unfortunately, with how APOC is implemented, this isn't something easy for us to fix at this time. My suggestion would be to use one of the other export.json procedures in which you can feed the data into it using Cypher, then you can control how much data is getting consumed at a given time.