machinalis / iepy

Information Extraction in Python
BSD 3-Clause "New" or "Revised" License
906 stars 186 forks source link

Memory usage of rules.py #65

Closed rafacarrascosa closed 9 years ago

rafacarrascosa commented 9 years ago

rules.py has a number of caches for performance. At least one of this caches is still prone to memory "leaks"[0], namely: cached_segment_enriched_tokens I could verify this using a memory profiler.

Besides fixing this, it would be great to take a look at all functions in this module to double check that there are no more potential leaks (two weeks ago there was another cache "leak" fixed in this module).

[0] Unlimited caching more precisely

rafacarrascosa commented 9 years ago

I commited a fix for this replacing a hand made cache with a lru_cache with a limited memory. The replaced code had a warning against using lru_cache for this task... so I wrote some tests to check that the adverse effect was not happening and replaced with lru_cache.