neo4j / neo4j-javascript-driver

Neo4j Bolt driver for JavaScript
https://neo4j.com/docs/javascript-manual/current/
Apache License 2.0
853 stars 148 forks source link

Memory leak in transactions with many queries #1186

Open jroith opened 7 months ago

jroith commented 7 months ago

I have a situation where the driver can cause nodeJS to run out of memory if a transaction with many queries is run (700k very simple ones in my case). While it may seems unreasonable to execute so many queries instead of executing a large query, the driver should consume more or less constant memory if possible.

The source is an array _results that grows with every call to run():

https://github.com/neo4j/neo4j-javascript-driver/blob/6f72b3ddf39e82c220ff4c6d7dd21cda8bd11d05/packages/core/src/transaction.ts#L212

It seems like why this array exists in the first place is to be able to check isOpen() in the _onErrorCallback and to await the summary() of each result in pendingResults.

While I don't know why exactly it has to be done like that, I would like to suggest a simple fix. In most cases the call to run() will have been either awaited or if not (and subscribed) the promise will have been resolved soon (in practice often before the next query, certainly before a far later query). It seems like the results do not need to be sorted and therefore it should be possible to turn _results into a Set instead of an array and to drop the result from that set as soon as the summary is available.

In that way the size of the set would in my case like be 1 and in most other cases it would still be prevented from growing unboundedly and eventually causing nodeJS to crash.

bigmontz commented 6 months ago

Hey @jroith, the result is consumed before commit. Since, if you don't not consume the result, the database might commit partial changes (so, ignoring the non consumed parts).

Observe on the result for removing it from the pending results list when completed can solve this issue. However, the user stills able to messing up with everything if they didn't try to consume the results at all.

I will push this to the backlog, so it can be addressed in the future.

jroith commented 6 months ago

@bigmontz The user would not really be able to mess things up, because the results that were not removed from the pending list would still be there and can still be resolved by the commit/rollback callback. Unless you mean that the user can cause unbounded memory use in those cases - that's true but kind of expected and all other cases would be handled.

Would you possibly accept a PR?

bigmontz commented 6 months ago

Sure, can you take a look at our contributing guides?