pytoolz / toolz

A functional standard library for Python.
http://toolz.readthedocs.org/
Other
4.57k stars 258 forks source link

LRU Cache #39

Open mrocklin opened 10 years ago

mrocklin commented 10 years ago

We should implement a few different caching/memoization options. Least Recently Used is a standard and often used system.

functools 3.2 has an lru_cache HOF/decorator.

I actually think that the way to go is to build a number of dict-like objects (dict, limited-space-dict, ...) and pass these into memoize which will act the same regardless of the data structure used to store the results. This follows principles of composition and encapsulation.

An LRU dict could be implemented from OrderedDict. I've seen recipes online for limited space dictionaries. I even have one lying around work somewhere.

mrocklin commented 10 years ago

I've been playing a bit with fixed space data structures lately. I think this should be a separate project.

eriknw commented 10 years ago

I recently stumbled upon this package: https://github.com/uqfoundation/klepto

mrocklin commented 10 years ago

Good find.

Thoughts:

I might make a fixed size dict package under pytoolz some time when I need to burn a few hours. Should be fun.

eriknw commented 10 years ago

Agreed, although I don't feel a need to have all the different policies (it doesn't hurt to have 'em though). Passing a dict-like object makes sense to me too, as it also allows one to pre-populate the cache.

I'm curious to see what you come up with, and, yeah, it does sound like fun.

clarkfitzg commented 10 years ago

An LRU cache would be valuable. Raymond Hettinger advocated using this version in production- his backport from Python 3.3.

mrocklin commented 10 years ago

Yeah functools' lrucache is handy. What I'd really like though is an LRU dict-like object that was separate from memoization. I don't think it's necessary to tie together the ideas of LRU and the idea of memoization. I think that these can be developed separately and then used together.

branning commented 9 years ago

Here's another relevant project: Thingking, a fixed-size data structure for memory mapping a URL into a Python virtual memory buffer. It allows lazy loading of larger-than-memory resources, and uses a fixed amount of memory as a cache. It is built on top of the (backported) lru_cache from functools, so only supports an LRU policy. Once mapped, you can access the resource as a numpy array or as a file.