junxiemq / netcdf4-python

Automatically exported from code.google.com/p/netcdf4-python
Other
0 stars 0 forks source link

Using Variable instance in expression leaks memory #185

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Run repro.py. It will print increasing memory usage by the process.
2.
3.

What is the expected output? What do you see instead?
When using Variable instances in expressions, I expect the memory usage of the 
process not to increase.

What version of the product are you using? On what operating system?
python-2.7.3
netcdf4_python-0.9.8
netcdf-4.2.1.1
hdf5-1.8.9

Please provide any additional information below.
Using a netCDF4.Variable instance in an expression with a numpy.ndarray seems 
to leak memory. Explicitly casting a Variable to an ndarray also seems to leak 
memory. Sub-scripting a Variable instance to get an ndarray  doesn't leak 
memory. Using the latter array also performs much faster than using the 
Variable instance. See commented code at the bottom of repro.py.

Original issue reported on code.google.com by tjalli...@gmail.com on 5 Jun 2013 at 9:44

Attachments:

GoogleCodeExporter commented 8 years ago
When you use the netcdf variable object like a numpy array, python will perform 
the slicing operation on it each time you perform an operation inside the loop. 
 This creates a copy of the data each time, hence the slowness and the memory 
usage.

I don't see this as bug, but I could be wrong.  I also don't see that I can do 
much about it.   Reading the data out of the variable before the loop seems 
like the right solution to this problem.

Original comment by whitaker.jeffrey@gmail.com on 5 Jun 2013 at 11:38

GoogleCodeExporter commented 8 years ago
I see, thank you for the info. In either case, it seems to me that the memory 
usage has be more or less constant. But what I see is that the memory usage 
increases linearly with the number of iterations. See the attached pdf, based 
on the results of running repro.py.

The red dots represent the memory usage during iterations when slicing the 
Variable instance before the iteration. The blue-ish dots represent the memory 
usage during iterations when using the Variable instance directly in the 
expression. That suggests that some memory isn't released, don't you think?

Thanks,
Kor

Original comment by tjalli...@gmail.com on 5 Jun 2013 at 3:48

Attachments:

GoogleCodeExporter commented 8 years ago
I guess the numpy arrays created when you slice the variable within the loop 
are not cleaned up by the python garbage collector (I guess they do not go out 
of scope?).  That memory is not under the control of the netcdf module, so 
there's nothing I can do about it.

Original comment by whitaker.jeffrey@gmail.com on 5 Jun 2013 at 4:51

GoogleCodeExporter commented 8 years ago
Makes sense, thanks.
Kor

Original comment by tjalli...@gmail.com on 5 Jun 2013 at 8:56

GoogleCodeExporter commented 8 years ago

Original comment by whitaker.jeffrey@gmail.com on 26 Feb 2014 at 2:04