Closed Xylem closed 9 years ago
Excellent report. I will test out your changes in our environment and see if we have any side effects or hopefully can confirm reduced memory usage in long running processes. If all goes well I will push out a new release next week.
Thank you again. Great write-up.
For testing purposes, I recommend running SELECT
queries that return large number of rows. Due to the fact how ODBCResult::FetchAll()
and its counterparts are implemented, after fetching each row, uv_queue_work()
is queueing UV_FetchAll()
and UV_AfterFetchAll()
(which is the one of the affected functions) once again to retrieve the next row, until the result set is exhausted - or at least that's what I get from the brief analysis of the code. Such approach should result in the fastest memory usage growth, as the call to FetchAll()
alone should create number_of_rows + 1
handles. This would coincide with the behavior observed in our applications, as the ones making primarily SELECT
queries returning large amounts of rows were affected much more (~200MB memory usage growth per hour) than those dealing mainly with INSERT
s.
Calling
HandleScope.Close()
is unnecessary when its return value is not returned to a calling function. Additionaly, it causes V8 handles to be leaked and causes major memory leaks over time. The issue becomes apparent mainly in long-running and database-intensive applications.Reproduction
The following code is a minimal program allowing for reproduction of this issue. Apart from
odbc
, it usesasync@0.9.0
andheapdump@0.3.5
. In my case, I tested the issue using an instance of Postgres running on the same VM.After running it, V8 heap dumps can be generated by issuing
command.
With the version of
odbc
without these changes, the size of heap dumps of the running process should increase ~3-5MB per minute. When loaded intro Chrome Dev Tools, however, the reported heap size for each of the generated files should not exceed ~2MB. Since the heap dump is simply a JSON file, it can be analyzed manually, which I did. What follows is a description of my analysis of the following heap dump: https://drive.google.com/file/d/0B1gxZN-e6g8ZR09FNGk4TVFVaUE/view?usp=sharingIn array
nodes
, close to the beginning of the file, you should be able to see a line similar to the following:,3,9,23,0,3792103
. The fields, from left to right, aretype
(with3
beingobject
),name
, (being index in tablestrings
, in this case:(Handle scope)
),id
of the node,self_size
of the node and finally,edge_count
- the number of edges outgoing from that nodeFurther in the file, in the array
edges
, there should be a significant number of lines similar to,3,3415,80
. The fields, from left to right, aretype
(with3
beinginternal
),name_or_index
(again, in this case, referencing index fromstrings
array - contrary to previous case, the index varies on each run, but it always points to stringundefined_value
) andto_node
being theid
of a node it connects to. I don't know the exact format of the heap dump, but I assume that the origin node for the edge is calculated based on ordering of nodes in thenodes
array and their reportededge_count
. In case of the heap dump I'm describing, there were, in total, 3792107 edges referencing theundefined
value, which matches very closely the number of edges outgoing from the(Handle scope)
node.For comparison, here's a heap dump of the same program after applying my changes: https://drive.google.com/file/d/0B1gxZN-e6g8ZUkNWc1VOUTZ1eFk/view?usp=sharing
Cause
The mechanism of
HandleScope
s is based on a stack. When a newHandleScope
is created, it is put on top of the scope stack. WhenHandle<T>
objects are created (as well asLocal<T>
), they are being referenced by theHandleScope
being currently at the top of the scope stack. Once thatHandleScope
is removed from the scope stack (e.g. when function it was created in as a local variable returns), it takes care of disposing the handles that exist within it. This is why it is unsafe to simply returnHandle<T>
objects from a function - since they belonged to aHandleScope
that was deleted, anything they pointed to may already not exist. For that reason, theHandleScope.Close()
function exists. What it does is takes in a handle from the scope being closed and, after removing the scope from the scope stack, creates a new handle to the same object. Since the scope we just closed is not present on the scope stack, the newly created handle will be created within theHandleScope
further down the stack. Here's an example:Now, in the cases that were causing the issue, the following was being done:
As you can see, the handle created by the
HandleScope.Close()
function is not being returned anywhere. This is particularly problematic with allUV_After<Operation>
functions, as, I assume, onceUV_<Operation>
function exits andlibuv
calls theUV_After<Operation>
back in the main application thread, the only scope being present on the handle scope stack is the global handle scope (or other scope that exists throughout the entire lifetime of the process), which causes the handles created byHandleScope.Close()
to exist until the process is terminated.As for the specifics of the memory leak created by the issue, I'm not entirely sure. Partially, it's definitely information about all those handles being stored. I also suspect that it might be the case that V8's garbage collector is simply being crippled by the sheer amount of handles it has to keep track of to the point that it's no longer able to cope with freeing memory allocated by other objects that, under normal conditions, wouldn't cause problems.
Solution
I removed all calls to
HandleScope.Close()
where the return value was unused. There are no observable ill effects to these changes.