jankotek / mapdb

MapDB provides concurrent Maps, Sets and Queues backed by disk storage or off-heap-memory. It is a fast and easy to use embedded Java database engine.
https://mapdb.org
Apache License 2.0
4.87k stars 872 forks source link

Memory leak when updating large values in StoreDirect #979

Open dgalaktionov opened 3 years ago

dgalaktionov commented 3 years ago

Is version 3 still supported? Because if so, I'd like to report a memory leak in StoreDirect that I've tentatively fixed in my fork.

It occurs when a value that is stored as a linked record is replaced by another value with the same size, and you can easily replicate it if you use larger arrays in the test issue403_store_grows_with_values_outside_nodes, such as 100000 or anything larger than StoreDirectJava.MAX_RECORD_SIZE, to force it use linked records.

On the first update, dataTail would increase by 480 bytes, and then by 360 in every successive update, growing without bounds. As they happen to be multiples of StoreDirect.LONG_STACK_PREF_SIZE, I've figured it had to do with the memory management mechanisms.

After studying the code I can conclude that the problem lies in the fact that we are never actually allowing StoreDirect.longStackNewChunk to reuse chunks that are already free to allocate new stack records. allocateData is called with recursive=true (which according to your own comment may be overkill) and even the method longStackNewChunk is always called with recursive=true, rendering part of its code dead.

Is my fix, I've removed the recursive parameter from that method altogether and call allocateData with recursive=false, thus always allowing to reuse free chunks to allocate space for new records. I'm aware that this may cause longStackNewChunk to eventually recurse, either because taking from the stack may cause us to have to delete an empty stack record or because we need to allocate a new page (or even both), but this recursivity cannot be infinite unless all the stack records are so small that can only hold one value, for some reason (and even then I think it may stop once that stack empties).

With this fix, in my earlier test, I'd still get a 480 byte increase after the first update (which is unavoidable if every stack is initially empty), but then dataTail would remain stable for every next update, of any value, as long as the size of the new value is the same. I've ran the other tests and everything seems to be working as previously.

Is there any edge case I could have missed or consideration I've failed to make? I could gladly make a PR otherwise.

P.S.: Yes, I'm aware that StoreDirect.compact() would get rid of the leaked overhead, but ideally I'd prefer to avoid unnecessary leaks altogether, since I'm never deleting entries from a BTree in my application. More so considering that it can easily trigger an OOM for large stores, but that'd be a separate issue that I've fixed as well by letting it use a file.