zodb / relstorage

A backend for ZODB that stores pickles in a relational database.
Other
54 stars 46 forks source link

Rewrite the cache using Cython on top of C++ #358

Closed jamadden closed 4 years ago

jamadden commented 4 years ago

This should fix #186: Instead of 328 bytes of bookeeping data for each entry, we should be down to something like 88 for single values and 72+ for multiple values. (This is not counting the map, but I didn't count that before.)

Other upsides:

Benchmark Master This branch
pop_eq 1.47 sec 498 ms: 2.95x faster (-66%)
pop_ne 1.64 sec 588 ms: 2.79x faster (-64%)
epop 1.25 sec 497 ms: 2.51x faster (-60%)
read 345 ms 182 ms: 1.90x faster (-47%)
mix 1.63 sec 599 ms: 2.72x faster (-63%)

Downsides:

Before merging:

jamadden commented 4 years ago

I've spent more time profiling and testing this. The current approach uses the standard library's double-linked list to manager the LRU order.

I can save one pointer per element if I roll a doubly-linked list manually and embed them in the nodes (similar to what the old C cache code did). That turned out to be a nightmare to get correct.

I could probably also save one pointer per element if I let the library containers allocate the nodes, but that would lose the ability to do the big bulk allocations.

Last, I can also save the equivalent of one pointer by using a deque of OIDs to hold the LRU list. That's not guaranteed O(1) (I have to do maintenance from time to time, but it should amortize out to that). We need an index in each node as well as the deque, so instead of three pointers per node it comes out to two.

None of those had compelling advantages AFAICS. I think we're in a decent sweet spot here. And performance has only improved:

Benchmark Master This branch (early) This branch (now)
pop_eq 1.47 sec 498 ms: 2.95x faster (-66%) 353 ms
pop_ne 1.64 sec 588 ms: 2.79x faster (-64%) 551 ms
epop 1.25 sec 497 ms: 2.51x faster (-60%) 372 ms
read 345 ms 182 ms: 1.90x faster (-47%) 91.1 ms
mix 1.63 sec 599 ms: 2.72x faster (-63%) 522 ms
jamadden commented 4 years ago

After writing my own containers to get the minimal size improvements it occurred to me that boost might already have some...and sure enough they do, a whole "intrusive" library.

The cache has two indices of objects, one LRU and one lookup. That's five pointers, all of which would be embedded with the single object. And that would be the total of the overhead.

With the std containers, the map needs three pointers per entry, plus a separate copy of the (pointer-sized) key for a total of four pointers. And the other index needs three pointers, plus the iterator-object embedded in the object, which I think is also two pointers, adding up to 5. In total, that's 9 pointers, four of which can go away, saving us 32 bytes per object (if I've done the math correctly). (And I think there's some other overhead in the unordered_map I'm not accounting for.)

Only downside I see is that we'd be switching from the hash-based lookup to a tree-based one: amortized O(1) to something logarithmic.

It shouldn't be hard to try out. It seems like it's worth doing.