salortiz / LMDB_File

Perl wrapper around the OpenLDAP's LMDB
Other
8 stars 12 forks source link

Possible memory leak while reading #21

Closed akotlar closed 7 years ago

akotlar commented 7 years ago

Behavior: Simply calling get brings RES and SHR up to the size of the database on disk.

Example: My database is 3.6G. I read every key. Somewhere during that process, I/O wait time drops to 0% and stays there (read ahead). At that point RES = 3.6g SHR = 3.6g. Over the course of the rest of the run, RES grows to 3.7G. This could be a rounding issue.

This is a simple case with no chance of a closed over variable maintaining a reference. This behavior is seen with the database with flags MDB_NOTLS | MDB_NOMETASYNC | MDB_NOLOCK | MDB_NOSYNC | MDB_RDONLY ( I think NOLOCK pretty much obviates all others), and with no flags set.

This occurs both with MDB_RDONLY and not.

Code:


for my $posKey (0 .. $lastPosKey) {
  $db->dbReadOne('databaseName', $posKey);
}

sub dbReadOne {
  my ($self, $dbName, $key) = @_;
  my $db = $self->_getDbi($dbName);

  my $txn = $db->{env}->BeginTxn(MDB_RDONLY);

  $txn->get($db->{dbi}, $key, my $data);

  return undef;
}
hoytech commented 7 years ago

Sorry haven't had a chance to look at this yet.

Just to clarify, you are expecting the RSS to grow by 3.6g because that is the size of your data-set and you are touching the whole data-set, however the RSS actually grows by 3.7g, implying a memory leak of around 100m. Is that correct?

akotlar commented 7 years ago

@hoytech No worries, I don't have any expectation of this being addressed, you're doing this in your free time.

I maybe naively expect that RES will grow only in proportion to the amount of data my Perl program assigns to variables or holds references to.

In the above example, as soon as the data is retrieved from the database it should be marked as available for garbage collection. However, garbage collection is either delayed (I presume because Perl decides this is the best way to preserve CPU cycles, allowing RES grows to 3.6G) or prevented (by some memory leak). I don't know which scenario is true.

hoytech commented 7 years ago

RES will grow only in proportion to the amount of data my Perl program assigns to variables or holds references to

No, RES/RSS corresponds to the number of pages mapped into your process that have valid page table entries. If they have page table entries this implies they are resident in memory either as anonymous memory (ie the heap, stack, etc) or in the page-cache (ie shared libraries, LMDB databases).

If your program iterates over an entire LMDB mapping, it creates page table entries for each page in the DB, which reference the pages in the filesystem cache that are backing the LMDB database. Therefore the RSS will increase by the entire size of the DB. The RSS will decrease immediately if you munmap() the mapping by closing the LMDB environment.

This is one of several reasons why measuring the memory usage of a process with RSS is problematic.

The increase in RSS isn't really anything to worry about since these pages aren't really "owned" by your process but instead by the filesystem cache. The OS will gradually remove them from the filesystem cache if it becomes necessary due to memory pressure. This is an operation performed by the OS's virtual memory system, not the perl garbage collector. If you wish to tell the OS to "hurry up" and free that memory from filesystem cache immediately, you can check out my utility vmtouch: https://hoytech.com/vmtouch/ (note that vmtouch usually doesn't have an effect on pages that have populated PTEs in existing processes though).

I've created a quick perl script to demonstrate what I just described:

https://gist.github.com/hoytech/051de5ebc2a090d7bdc81c074eeb0b5e

Hope this helps.

akotlar commented 7 years ago

Thank you. This seems to be resolved.

akotlar commented 7 years ago

By the way, wanted to comment that I found this tool quite useful. Thanks for making it.

hoytech commented 7 years ago

Glad to hear, thanks for letting me know!