Data Vault memory leak - Githubissues

pomalley commented 9 years ago

screenshot from 2015-06-30 16 55 50

pomalley commented 9 years ago

bump:

skynet-mem

maffoo commented 9 years ago

@amitsv1 pointed out a numpy issue that may be the cause of the problem, namely a memory leak in numpy.loadtxt: https://github.com/numpy/numpy/issues/651

maffoo commented 9 years ago

I have a branch that eliminates the call to numpy.loadtxt but for now am unable to repro the issue: https://github.com/martinisgroup/servers/tree/u/maffoo/fix-datavault-leak

amitsv1 commented 9 years ago

I'm able to reproduce the results in the top comment of numpy/numpy#651 (RSS is ~2GB after del arr and gc.collect). Replacing the loadtxt with your code in fix-datavault-leak also results in leaked memory (500MB RSS after doing del arr and collect... different dtype?). My desktop is running Python 2.7.6-8ubuntu0.2 and numpy 1:1.8.2-0ubuntu0.1 on Ubuntu 14.04.3 (identical software configuration to skynet). I have not tried reproducing the leak within the data vault code.

ejeffrey commented 9 years ago

I pushed a branch u/ejeffrey/dv_log_memory that adds log statements about the virtual size and resident size every time a dataset is opened/created/closed. This should let us see when the offending memory allocation happens. Until we have better logging (see pylabrad issue https://github.com/labrad/pylabrad/issues/156 ) just run the datavault from the command line with the --auto parameter and redirect output to a logfile somewhere

$ data_vault_multihead.py --auto > /log/file.txt

On Mon, Aug 31, 2015 at 3:05 PM, Amit Vainsencher notifications@github.com wrote:

I'm able to reproduce the results in the top comment of numpy/numpy#651 https://github.com/numpy/numpy/issues/651 (RSS is ~2GB after del arr). Replacing the loadtxt with your code in fix-datavault-leak also results in leaked memory (500MB RSS after doing del arr... different dtype?). My desktop is running Python 2.7.6-8ubuntu0.2 and numpy 1:1.8.2-0ubuntu0.1 on Ubuntu 14.04.3 (identical software configuration to skynet). I have not tried reproducing the leak within the data vault code.

— Reply to this email directly or view it on GitHub https://github.com/martinisgroup/servers/issues/198#issuecomment-136513815 .

maffoo commented 9 years ago

So I did some tests on matrix-reloaded and was able to reproduce the issue from the numpy bug. Then I tried the following two functions to load the array.txt file created as in that issue (I converted it to floats since that's what we use in the datavault, but it doesn't make much difference either way):

def load1(fname):
    with open(fname) as f:
        return np.vstack([float(n) for n in line.split(' ')]
                         for line in f.xreadlines())

def load2(fname):
    with open(mat) as f:
        return np.vstack(np.array([float(n) for n in line.split(' ')])
                         for line in f.xreadlines())

Here are the results of some interactive sessions, with results from top in the comments:

#   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
# 22472 maffoo    20   0  190472  25600   6152 S   0.0  0.0   0:27.27 ipython
In [1]: a = load1('array.txt')
# 22472 maffoo    20   0  581120 416248   6152 S   0.0  0.2   0:36.60 ipython
In [2]: del a
# 22472 maffoo    20   0  190492  25620   6152 S   0.0  0.0   0:36.60 ipython

In [3]: a = load2('array.txt')
# 22472 maffoo    20   0  971736 806860   6152 S   0.0  0.3   0:45.65 ipython
In [4]: del a
# 22472 maffoo    20   0  581108 416232   6152 S   0.0  0.2   0:45.65 ipython
In [5]: import gc; gc.collect()
Out[5]: 0
# 22472 maffoo    20   0  190492  25620   6152 S   0.0  0.0   0:45.70 ipython

There's obviously a lot of extra garbage created by load2, presumably due to the temporary numpy arrays created for each row. However, in neither case does there appear to be a leak, as forcing a gc.collect() gets us back to the original memory usage. I will modify the code in the u/maffoo/fix-datavault-leak branch to avoid creating these temporary arrays.

maffoo commented 5 years ago

Closing since this particular leak seems to be fixed.

labrad / servers

Data Vault memory leak #198