Cache: Allow using shared memory

Currently, the storage pickle cache is private memory, allocated per-process.

A common architecture for servers (e.g., gunicorn) is to spawn many worker processes on a single machine as a way to utilize multiple cores. Each such worker process gets its own pickle cache (per RelStorage storage, which could be greater than 1 in a multi-db scenario).

As the number of cores and workers goes up, the amount of memory needed to keep a reasonable-sized RelStorage cache also goes up. Even if the memory was initially shared due to fork(), because of the nature of the cache, the pages quickly become dirty and have to be copied.

I've been investigating, and think it should be possible to move the storage caches into shared memory on Unix and Windows. The option that requires the least code changes and keeps most of the caching logic intact uses boost.interprocess (we're already using boost.intrusive in the cache).

Benefits include:

A larger cache, shared among workers, could use overall less memory, while still effectively being larger. For example, instead of 8 workers with a 500MB cache = 4GB, you might use a single shared memory cache of 2GB. Overall memory use goes down, but effective cache size goes up.
If the workers are performing similar operations (e.g., there's nothing like zc.resumelb in use that tries to direct similar work to the same worker) this should result in overall better hit rates.
When one worker performs a write, the cached value would be immediately available to other workers on the same machine without need for a database hit. Some goes for a read, too.
The ability to drop the GIL. Right now we're relying on the GIL for all cache operations, but that will have to change.
The possibility to store the cache as a memory-mapped file, meaning it takes 0 time to load/store to the SQLite database.

Possible drawbacks/open questions include:

There will be cross-process synchronization required. Benchmarks will be needed to test the overhead in different workloads. (But this is the part that lets us drop the GIL).
The memory limitations will be stricter, and depending on the allocation strategy, fragmentation may be an issue. Benchmarks/tests will be needed.
Currently on CPython, we keep byte objects is the cache directly, meaning there is no memory copy involved to read or write to the cache. Shared memory will require at least a write copy; it may or may not be possible to implement 0-copy reads.

Initially, for the smallest code changes, shared memory caches will only work with processes on Unix that are related via fork(): this is because the C++ objects have vtables in them and those same vtable pointers must be valid in all processes accessing the cache. Only child processes have that guarantee (and only if RelStorage was loaded in the parent process before the fork()). Over time, it should be possible to remove this restriction.

First of all thank you Jason for all of the wonderful work you have been doing. Your articles on the ZODB are just brilliant. https://dev.nextthought.com/blog/2019/10/intro-zodb.html https://dev.nextthought.com/blog/2019/11/relstorage-30tml I also hugely appreciate your efforts to document what you plan on doing. So let me expand on your email for beginners, and to ask you a few obvious questions.

SHARED PICKLE CACHE You said: “Currently, the storage pickle cache is private memory, allocated per-process. ”Your articles said:

“ • Multiple threads in the same process share a high-performance in-memory pickle cache to reduce the number of queries to the RDBMS. This is similar to ZEO, and the ZEO cache trace tools are supported. ” Just to be clear, currently the cache is in the same process, and only in RelStorage, not in FileStorage. You want to expand this to shared memory across processes in RelStorage, but still not in FileStorage. Presumably because FileStorage can only write from one process.

CACHE INVALIDATION So I thought that the shared pickle cache eliminated the need for Cache invalidation. Is that true? I understand that databases with server process do the cache invalidation. But how did SQLite do cache invalidation if it did not have a fully shared pickle cache, and no shared server process?

FILE STORAGE CACHE I want to port the existing shared pickle cache to a single shared process in FileStorage. Which begs the question why can SQLite write from multiple processes, but FileStorage can only write from one process? The file lock could be acquired by any process.

I hope that this user feedback helps you.

Warm Regards Christopher Lozinski

https://PythonLinks.info US tel: +1 650 614 1836 EU tel: +48 32 361 3136 Skype: clozinski

(ETA some clarifications.)

Just to be clear, currently the cache is in the same process, and only in RelStorage...You want to expand this to shared memory across processes in RelStorage,

Yes.

but still not in FileStorage. Presumably because FileStorage can only write from one process.

No. Because, just like the SQLite backend in RelStorage, FileStorage wouldn't benefit. Because the data exists only as a file on one machine and there is no server involved, FileStorage uses the operating system's filesystem cache as its pickle cache. It's automatically as big as it can be without impacting application memory needs.

CACHE INVALIDATION

The shared pickle cache has nothing at all to do with invalidation. All ZODB storages have to deal with invalidation in one way or another. ZEO does it via pushing invalidations from the server to clients. RelStorage does it via polling the server in each client (SQLite counts as a server for this purpose; by "server" I just mean "the central data store"). Changes in RelStorage 3 made that polling more efficient by sharing some state between different connections in the same process. (That state could also be moved to shared memory and re-used between processes, but (a) I don't have any indication that would actually be a significant benefit anymore — polling has gotten pretty fast already — and (b) the design of that state is all in Python objects and would be much harder to move compared to the pickle cache, which is already implemented in C++.)

FILE STORAGE CACHE I want to port the existing shared pickle cache to a single shared process in FileStorage.

I wouldn't recommend that.

Which begs the question why can SQLite write from multiple processes, but FileStorage can only write from one process? The file lock could be acquired by any process.

RelStorage and SQLite were designed to be used from multiple processes, FileStorage wasn't. It keeps certain state in-memory (e.g., the index in the fsBTree), and it would have a hard time dealing with invalidations efficiently (each new read access would have to scan the tail of the file to find invalidations, i.e., it would have to implement polling based on reading the records in the file — SQLite can do that efficiently because of on-disk indexes). Much simpler just to use ZEO or SQLite.

zodb / relstorage

Cache: Allow using shared memory #446