Open numpy-gitbot opened 12 years ago
trac user tanriol wrote on 2012-08-09
BTW, as no 1.6.2 was available in 'Version' field, 1.6.1 was selected
atmention:rgommers wrote on 2012-08-10
I can reproduce the memory usage, but executing gc.collect()
before and after the loadtxt call shows there are no new unreachable objects being created. Also, after exiting the interpreter the memory is freed. Hence this isn't a memory leak.
The gc.collect docs say that some objects in particular int and float aren't freed by it. So it looks like that is the case here.
trac user tanriol wrote on 2012-08-11
Do you mean that this kind of excessive memory consumption is likely unfixable in numpy? If so, are there any efficient workarounds to load CSV files without such overconsumption?
atmention:rgommers wrote on 2012-08-11
The memory consumption shouldn't be a problem as long as the memory is freed before you start swapping to disk. This should be handled by the memory allocator of your OS.
If you do see swapping it is a serious problem. IIRC pandas has a different loadtxt function (implemented in C) which you could try.
trac user tanriol wrote on 2012-08-12
Yes, swapping is observed with the real data files (which are larger). Pandas leaks less memory, but still too much. Probably I'll have to use pandas with its lazy chunk-by-chunk method for loading as neither loadtxt nor genfromtxt seem to be able to build the numpy arrays chunk-by-chunk.
atmention:rgommers wrote on 2012-08-12
Then it's probably a Python issue, see these links:
http://pushingtheweb.com/2010/06/python-and-tcmalloc/
http://hg.python.org/cpython/rev/f8a697bc3ca8 (so claim is that issue is fixed/improved for Python 3.3.)
http://mail.scipy.org/pipermail/numpy-discussion/2011-May/056427.html
Not sure if rebuilding Python against tcmalloc is possible and worth it for you. If that does solve the issue, that would be good to know.
Original ticket http://projects.scipy.org/numpy/ticket/2198 on 2012-08-09 by trac user tanriol, assigned to unknown.
The amount of memory leaked far exceeds the amount the loaded data takes and does not go away when the loaded array is deleted.
CPython 2.7.3, Numpy 1.6.2