zopefoundation / persistent

automatic persistence for Python objects
https://pypi.org/project/persistent/
Other
46 stars 28 forks source link

Change PickleCache from simple LRU to a more modern policy (like windowed LFU/SLRU) #45

Open jamadden opened 8 years ago

jamadden commented 8 years ago

Over in RelStorage we've been having a discussion based on the observation that strict LRU isn't necessarily a good cache policy for varying workloads.

Basically, if most queries/transactions/requests use a certain set of objects, those objects shouldn't be evicted from the cache just because an outlier request comes in that scans a different BTree. LRU is prone to that problem. More adaptive approaches aren't.

@ben-manes pointed us to a policy that can be near optimal for a variety of workloads.

I think that would be a good fit for the PickleCache.

I have a C and CFFI implementation that I'm using in RelStorage. We may or may not be able to directly share the code, I don't know (it'd be cool to split this off as its own library on PyPI and distribute binary wheels just for it so I don't have to do that for RelStorage), but we could at least start with it.

Other notes:

tseaver commented 8 years ago

Seems reasonable to me to experiment. Maybe distributing it separate package would keep the focus tight. We might need to add a hook (an environment variable, maybe?) to let people configure which cache implementation to use, at least for benchmarking purposes.

jamadden commented 8 years ago

Distributing it (where "it" is an implementation of the PickleCache) as a separate package might be difficult, at least as far as the C version goes.

The CFFI version could be built and distributed separately, but that's always going to have overhead that a pure C implementation doesn't (although in RelStorage, the CFFI implementation is quite a bit faster than the CFFI implementation currently shipping with persistent).

Distributing a C version could be possible, but because it couldn't use the CPersistentRing struct that's embedded in a persistent object (the struct definition is quite different), we'd lose quite a lot of the benefit of that, and it would complicate memory management 😢 (CPython memory management is something I know relatively little about).

I could probably implement it here and run zodbshootout, but unfortunately that's not a very realistic workload (although RelStorage's cache did show notable improvements)

jimfulton commented 6 years ago

Drive by :) comments: