isovector / take2

the real accio analytics platform
7 stars 2 forks source link

Most Flyweight instances never call reclaim() #70

Open edmundnoble opened 9 years ago

edmundnoble commented 9 years ago

Only the Cluster and Snapshot flyweights ever call reclaim(), and they don't do so deterministically. This will lead to a memory leak as the database grows. Two possible solutions are a) prevent the Flyweight from growing beyond a certain size or b) ensure that reclaim is called often enough.

To implement a), it might be worthwhile to re-architect the Flyweight trait into a cache (maybe an LRU cache) with a fixed maximum size, so that reclaim never needs to be called in the first place. To implement b), a simple (if hacky) solution is to schedule reclaim() to be called at specified intervals (possibly specified in a .conf file).

b) has the advantage over a) of not limiting the Flyweight's size in case of an unexpected surge of activity. Also, a) may require periodic updating of the maximum Flyweight size, as the user base grows. However, this also means that b) may still allow memory leaks. a) has the advantage over b) of being more predictable, and having less risk of a memory leak.

Leaning towards b) for now, not sure.

isovector commented 9 years ago

nightly reclaiming would probably make the most sense to me. there's an inherent issue here though, mainly in the commit model whose entirely needs to be in memory IIRC; it's probably not as simple as just calling reclaim once in a while :/