grantjenks / python-diskcache

Python disk-backed cache (Django-compatible). Faster than Redis and Memcached. Pure-Python.
http://www.grantjenks.com/docs/diskcache/
Other
2.3k stars 128 forks source link
cache filesystem key-value-store persistence python

DiskCache: Disk Backed Cache

DiskCache_ is an Apache2 licensed disk and file backed cache library, written in pure-Python, and compatible with Django.

The cloud-based computing of 2023 puts a premium on memory. Gigabytes of empty space is left on disks as processes vie for memory. Among these processes is Memcached (and sometimes Redis) which is used as a cache. Wouldn't it be nice to leverage empty disk space for caching?

Django is Python's most popular web framework and ships with several caching backends. Unfortunately the file-based cache in Django is essentially broken. The culling method is random and large caches repeatedly scan a cache directory which slows linearly with growth. Can you really allow it to take sixty milliseconds to store a key in a cache with a thousand items?

In Python, we can do better. And we can do it in pure-Python!

::

In [1]: import pylibmc In [2]: client = pylibmc.Client(['127.0.0.1'], binary=True) In [3]: client[b'key'] = b'value' In [4]: %timeit client[b'key']

10000 loops, best of 3: 25.4 µs per loop

In [5]: import diskcache as dc In [6]: cache = dc.Cache('tmp') In [7]: cache[b'key'] = b'value' In [8]: %timeit cache[b'key']

100000 loops, best of 3: 11.8 µs per loop

Note: Micro-benchmarks have their place but are not a substitute for real measurements. DiskCache offers cache benchmarks to defend its performance claims. Micro-optimizations are avoided but your mileage may vary.

DiskCache efficiently makes gigabytes of storage space available for caching. By leveraging rock-solid database libraries and memory-mapped files, cache performance can match and exceed industry-standard solutions. There's no need for a C compiler or running another process. Performance is a feature and testing has 100% coverage with unit tests and hours of stress.

Testimonials

Daren Hasenkamp_, Founder --

"It's a useful, simple API, just like I love about Redis. It has reduced
the amount of queries hitting my Elasticsearch cluster by over 25% for a
website that gets over a million users/day (100+ hits/second)."

Mathias Petermann_, Senior Linux System Engineer --

"I implemented it into a wrapper for our Ansible lookup modules and we were
able to speed up some Ansible runs by almost 3 times. DiskCache is saving
us a ton of time."

Does your company or website use DiskCache? Send us a message <contact@grantjenks.com> and let us know.

.. Daren Hasenkamp: https://www.linkedin.com/in/daren-hasenkamp-93006438/ .. Mathias Petermann: https://www.linkedin.com/in/mathias-petermann-a8aa273b/

Features

.. image:: https://github.com/grantjenks/python-diskcache/workflows/integration/badge.svg :target: https://github.com/grantjenks/python-diskcache/actions?query=workflow%3Aintegration

.. image:: https://github.com/grantjenks/python-diskcache/workflows/release/badge.svg :target: https://github.com/grantjenks/python-diskcache/actions?query=workflow%3Arelease

Quickstart

Installing DiskCache is simple with pip <http://www.pip-installer.org/>::

$ pip install diskcache

You can access documentation in the interpreter with Python's built-in help function::

import diskcache help(diskcache) # doctest: +SKIP

The core of DiskCache is three data types intended for caching. Cache objects manage a SQLite database and filesystem directory to store key and value pairs. FanoutCache provides a sharding layer to utilize multiple caches and DjangoCache integrates that with Django_::

from diskcache import Cache, FanoutCache, DjangoCache help(Cache) # doctest: +SKIP help(FanoutCache) # doctest: +SKIP help(DjangoCache) # doctest: +SKIP

Built atop the caching data types, are Deque and Index which work as a cross-process, persistent replacements for Python's collections.deque and dict. These implement the sequence and mapping container base classes::

from diskcache import Deque, Index help(Deque) # doctest: +SKIP help(Index) # doctest: +SKIP

Finally, a number of recipes_ for cross-process synchronization are provided using an underlying cache. Features like memoization with cache stampede prevention, cross-process locking, and cross-process throttling are available::

from diskcache import memoize_stampede, Lock, throttle help(memoize_stampede) # doctest: +SKIP help(Lock) # doctest: +SKIP help(throttle) # doctest: +SKIP

Python's docstrings are a quick way to get started but not intended as a replacement for the DiskCache Tutorial and DiskCache API Reference.

.. Cache: http://www.grantjenks.com/docs/diskcache/tutorial.html#cache .. FanoutCache: http://www.grantjenks.com/docs/diskcache/tutorial.html#fanoutcache .. DjangoCache: http://www.grantjenks.com/docs/diskcache/tutorial.html#djangocache .. Django: https://www.djangoproject.com/ .. Deque: http://www.grantjenks.com/docs/diskcache/tutorial.html#deque .. Index: http://www.grantjenks.com/docs/diskcache/tutorial.html#index .. _recipes: http://www.grantjenks.com/docs/diskcache/tutorial.html#recipes

User Guide

For those wanting more details, this part of the documentation describes tutorial, benchmarks, API, and development.

.. DiskCache Tutorial: http://www.grantjenks.com/docs/diskcache/tutorial.html .. DiskCache Cache Benchmarks: http://www.grantjenks.com/docs/diskcache/cache-benchmarks.html .. DiskCache DjangoCache Benchmarks: http://www.grantjenks.com/docs/diskcache/djangocache-benchmarks.html .. Talk: All Things Cached - SF Python 2017 Meetup: http://www.grantjenks.com/docs/diskcache/sf-python-2017-meetup-talk.html .. Case Study: Web Crawler: http://www.grantjenks.com/docs/diskcache/case-study-web-crawler.html .. Case Study: Landing Page Caching: http://www.grantjenks.com/docs/diskcache/case-study-landing-page-caching.html .. DiskCache API Reference: http://www.grantjenks.com/docs/diskcache/api.html .. DiskCache Development: http://www.grantjenks.com/docs/diskcache/development.html

Comparisons

Comparisons to popular projects related to DiskCache_.

Key-Value Stores ................

DiskCache_ is mostly a simple key-value store. Feature comparisons with four other projects are shown in the tables below.

.. dbm: https://docs.python.org/3/library/dbm.html .. shelve: https://docs.python.org/3/library/shelve.html .. sqlitedict: https://github.com/RaRe-Technologies/sqlitedict .. pickleDB: https://pythonhosted.org/pickleDB/

Features

================ ============= ========= ========= ============ ============ Feature diskcache dbm shelve sqlitedict pickleDB ================ ============= ========= ========= ============ ============ Atomic? Always Maybe Maybe Maybe No Persistent? Yes Yes Yes Yes Yes Thread-safe? Yes No No Yes No Process-safe? Yes No No Maybe No Backend? SQLite DBM DBM SQLite File Serialization? Customizable None Pickle Customizable JSON Data Types? Mapping/Deque Mapping Mapping Mapping Mapping Ordering? Insert/Sorted None None None None Eviction? LRU/LFU/more None None None None Vacuum? Automatic Maybe Maybe Manual Automatic Transactions? Yes No No Maybe No Multiprocessing? Yes No No No No Forkable? Yes No No No No Metadata? Yes No No No No ================ ============= ========= ========= ============ ============

Quality

================ ============= ========= ========= ============ ============ Project diskcache dbm shelve sqlitedict pickleDB ================ ============= ========= ========= ============ ============ Tests? Yes Yes Yes Yes Yes Coverage? Yes Yes Yes Yes No Stress? Yes No No No No CI Tests? Linux/Windows Yes Yes Linux No Python? 2/3/PyPy All All 2/3 2/3 License? Apache2 Python Python Apache2 3-Clause BSD Docs? Extensive Summary Summary Readme Summary Benchmarks? Yes No No No No Sources? GitHub GitHub GitHub GitHub GitHub Pure-Python? Yes Yes Yes Yes Yes Server? No No No No No Integrations? Django None None None None ================ ============= ========= ========= ============ ============

Timings

These are rough measurements. See DiskCache Cache Benchmarks_ for more rigorous data.

================ ============= ========= ========= ============ ============ Project diskcache dbm shelve sqlitedict pickleDB ================ ============= ========= ========= ============ ============ get 25 µs 36 µs 41 µs 513 µs 92 µs set 198 µs 900 µs 928 µs 697 µs 1,020 µs delete 248 µs 740 µs 702 µs 1,717 µs 1,020 µs ================ ============= ========= ========= ============ ============

Caching Libraries .................

.. klepto: https://pypi.org/project/klepto/ .. joblib.Memory: https://joblib.readthedocs.io/en/latest/memory.html

Data Structures ...............

.. dict: https://docs.python.org/3/library/stdtypes.html#typesmapping .. pandas: https://pandas.pydata.org/ .. _Sorted Containers: http://www.grantjenks.com/docs/sortedcontainers/

Pure-Python Databases .....................

.. ZODB: http://www.zodb.org/ .. CodernityDB: https://pypi.org/project/CodernityDB/ .. _TinyDB: https://tinydb.readthedocs.io/

Object Relational Mappings (ORM) ................................

.. Django ORM: https://docs.djangoproject.com/en/dev/topics/db/ .. SQLAlchemy: https://www.sqlalchemy.org/ .. Peewee: http://docs.peewee-orm.com/ .. SQLObject: http://sqlobject.org/ .. _Pony ORM: https://ponyorm.com/

SQL Databases .............

.. SQLite: https://docs.python.org/3/library/sqlite3.html .. MySQL: https://dev.mysql.com/downloads/connector/python/ .. PostgreSQL: http://initd.org/psycopg/ .. Oracle DB: https://pypi.org/project/cx_Oracle/ .. _Microsoft SQL Server: https://pypi.org/project/pyodbc/

Other Databases ...............

.. Memcached: https://pypi.org/project/python-memcached/ .. MongoDB: https://api.mongodb.com/python/current/ .. Redis: https://redis.io/clients#python .. LMDB: https://lmdb.readthedocs.io/ .. BerkeleyDB: https://pypi.org/project/bsddb3/ .. LevelDB: https://plyvel.readthedocs.io/

Reference

.. DiskCache Documentation: http://www.grantjenks.com/docs/diskcache/ .. DiskCache at PyPI: https://pypi.python.org/pypi/diskcache/ .. DiskCache at GitHub: https://github.com/grantjenks/python-diskcache/ .. DiskCache Issue Tracker: https://github.com/grantjenks/python-diskcache/issues/

License

Copyright 2016-2023 Grant Jenks

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

.. _DiskCache: http://www.grantjenks.com/docs/diskcache/