zopefoundation / persistent

automatic persistence for Python objects
https://pypi.org/project/persistent/
Other
46 stars 28 forks source link

Shared read-only state between objects with copy on write #93

Open jimfulton opened 6 years ago

jimfulton commented 6 years ago

There's a lot of interest in using ZODB with asynchronous frameworks, especially for applications that block on network requests to services. From a purely programming perspective, gevent makes this quite tractable, but the cost of maintaining many open ZODB connections with their own caches is a major challenge. The cost of maintaining many open connections could be mitigated if data could be shared among their caches.

One way to do this would be to have a shared state cache of read-only state objects. Consider the extremely common case of persistent objects that store their data in dictionaries (and leaving aside non-persistent subobjects, for the sake of discussion). Set-state for such objects could simply assign the instance dictionary to the state. First assigning an attribute to such an object could copy the state dict first. This would allow use of shared immutable state dicts, requiring no copying for read-only operations. Note that in this scenario, only state is shared, not persistent objects.

You could use slots, or secondary dictionaries for non-shared mutable state.

Similar schemes could be used for BTrees and Buckets, although we'd need to introduce new Python subobjects to represent shared state.

To make this work, we'd likely want to create persistent subobjects that disallowed storing non-persistent mutable subobjects, which would have other benefits.

jamadden commented 6 years ago

This is somewhat similar to RelStorage's in-memory pickle state cache, which is shared by all Connections of a Storage, but operating on the unpickled data (and then of course copying it). I like the idea!

A challenge there is making such a shared cache effective with the different MVCC states that each Connection may be seeing. RelStorage has a complicated system of "checkpoints" it uses to accomplish this that works OK for short-lived transactions and Connections that don't drift too far apart from each other in terms of their MVCC state.

jimfulton commented 6 years ago

This cache would be keyed by oid + serial, so it would be orthogonal to MVCC. It would store Python objects, so there would be no additional deserialization overhead. Because the sharing would be at the object level, there would be memory savings, not just savings in loading object objects.

jimfulton commented 6 years ago

If we could store non-dicts as __dict__, then we could use immutable dicts as shared state and trigger copy on failed setitem (or on noticing non-dicts), requiring no change to persistent state metadata.

jamadden commented 6 years ago

This cache would be keyed by oid + serial, so it would be orthogonal to MVCC.

Ah, I see. It helps that the current laughingly-misnamed "pickle cache" knows what (oid, serial) values it's going to be requesting; the RelStorage case just has to deal with arbitrary requests over time.

davisagli commented 6 years ago

Nice idea!