sukri12 / pysal

Automatically exported from code.google.com/p/pysal
Other
0 stars 0 forks source link

Reference Cycles in FileIO #190

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
by_col and by_row introduce reference cycles because these classes both point 
back to their parent. When their parent is dereferenced the classes maintain a 
reference to their parent and thus if can't be removed through normal reference 
counting and must wait until it's picked up by garbage collection. This can 
cause problems when users fail to call the appropriate close method as file 
object may remain open until garbage collection removes them.  Issue #187 fixed 
a small bug that prevented even gc from cleaning them up.

by_col and by_row were changed to be accessed through properties, so this only 
becomes an issue when either is actually used.

See #187 for more details on the issue.

Original issue reported on code.google.com by schmi...@gmail.com on 31 Jan 2012 at 4:28

GoogleCodeExporter commented 8 years ago
How about using Python's built-in "with" statement to handle file opening and 
closing? http://effbot.org/zone/python-with-statement.htm

Original comment by phil.stp...@gmail.com on 31 Jan 2012 at 11:44

GoogleCodeExporter commented 8 years ago
Context managers do appear to be useful, but aren't yet common place in python 
code. Even if we support them (by defining __enter__ and __exit__ methods) we 
can't guarantee people will use them. Also while this would address the open 
file issue (when "with" is used), it will not address the reference cycle 
problem, which can lead to memory leaks and just slower running code (as 
garbage collection takes time).

We should however add support for the with statement in FileIO as a new feature.

Original comment by schmi...@gmail.com on 1 Feb 2012 at 12:25

GoogleCodeExporter commented 8 years ago
caching the _By_Row object in FileIO.py turns out to be unnecessary.  Creating 
the _By_Row object involves less overhead than checking for it in a dictionary 
and returning it. I confirmed this by profiling with and without caching. 
Without caching is just slightly faster. The same is true for _By_Col in 
Tables.py

By not caching it, there is no cleanup necessary and we avoid reference cycles.

Fixed in r1282

Original comment by schmi...@gmail.com on 26 Jul 2012 at 3:24