Closed azmeuk closed 3 years ago
In general I don't think this is possible. PersistentMapping
objects may have persistent objects as keys (and values) and during conflict resolution it's not possible to hash those, as I'm sure you remember.
Even if that was changed, the way that PersistentReference
hashes (and compares) will be very different from the way the actual objects hash and compare, so the results cannot be directly correlated. That is, a set()
of PersistentReference
objects might be very different from a set()
of the actual objects. Two objects with different OIDs might hash and compare themselves equal, but two PersistentReferences
representing the same objects would not.
Imagine two transactions, each adding a persistent object (whether key or value) to a mapping. The objects themselves compare and hash equally, so they should be merged. But because they get different OIDs by the time conflict resolution kicks in, it's impossible for the mapping to know that they are equivalent. Best case scenario, the pickle writes out two "identical" objects referring to two different OIDs and on unpicking, whichever one was listed last in the pickle is the one that gets used. But garbage collection and the like will still see both OIDs as referenced. (Worst case scenario, the dict winds up badly corrupted.)
Indeed, this implementation will not solve conflicts if persistent objects are used as a mapping key (although there should not be any problem with persistent objects as values, as no hashing is involved for values, only comparisons).
Though, is this a problem? This patch still brings conflict resolution in some cases (when persistent objects are not used as keys), and should keep the current behavior for others (when they are not).
Though, is this a problem?
I think so, for a few reasons.
First, persistent objects as keys will produce incorrect results on Python 2 or (on Python 3) if something like https://github.com/zopefoundation/ZODB/pull/278 is ever merged. We're relying on the unhashability of PersistentReference
to protect us by raising TypeError
in that case; if for any reason it doesn't (as on Python 2 or with that PR) then this code is silently broken in ways which will lose data. That's not good.
Second, there's still a problem with persistent values. Even though they aren't hashed, they are checked for equality (com.get(k) == old.get(k):
). Unless they represent the identical object, that will raise an exception, meaning we've potentially done a lot of work getting to that point for no reason. Worse, the exception raised will be logged as an error with a traceback, likely alarming and confusing many users. With the class not even implementing _p_resolveConflicts
, ZODB can take a fast shortcut and avoid all that work.
I also don't understand why there's special handling for dicts. Why not special handling for lists, or sets or any other mutable data structure? This seems like a hard-to-explain arbitrary rule that's expensive to implement. It also seems like something that applications may very well not want. Suppose my values are dicts that come in from clients sending JSON: they should be atomic, all-or-nothing updates. I never want to try to merge them.
Basically, I'm concerned about the corner cases and potential for breakage and performance loss in a core data structure.
I do understand the desire, though! Sometimes its a shame to redo a transaction because some tiny little dict somewhere changed by one key. Perhaps you could publish a new data structure that's carefully documented and optimized for this conflict-resolving case, e.g.,
__hash__
or __eq__
of keys or values…).I'm not sure that it should belong here, though, because the policies around merging sub-datastructures are likely to be application specific.
Jason Madden wrote at 2020-9-18 09:04 -0700:
In general I don't think this is possible.
PersistentMapping
objects may have persistent objects as keys (and values) and during conflict resolution it's not possible to hash those, as I'm sure you remember.
Persistent objects as keys alone do not exclude conflict resolution --
example the Bucket
conflict resolution in the BTrees
package.
This strategy is based on the fact that each of the concurrent
activities has put the keys at the correct place in the data structure
(using the keys native ordering).
If the conflicting keys are at different places, then the
conflict may be resolved without the need to compare the keys themselves.
In principle, the same could apply to PersistentMapping
as well
if the conflict resolution had access to the storage cells
used for the mapping. In this case, conflicting keys stored in
different cells could be resolved without the need to know their
hash value -- as they are in the correct cells.
The current implementation does not give access to the storage cells.
Thus, for the moment no conflict resolution for PersistentMapping
involving persistent keys. I even believe that any persistent
key in a PersistentMapping
can render conflict resolution unreliable,
even if this key is not involved in a conflict. Because adding/deleting
may cause the underlying storage structure to be restructured
and in this case, the hash recomputed. This would give wrong
results as a PersistentReference
hash is different from that
of the corresponding persistent object.
Éloi Rivard wrote at 2020-9-18 09:25 -0700:
... Though, is this a problem? This patch still brings conflict resolution in some cases (when persistent objects are not used as keys), and should keep the current behavior for others (when they are not).
Conflict resolution heavily depends on the applications. Adding conflict resolution later on may break applictions which formerly worked correctly. Thus, adding conflict resolution is not necessarily good.
Example is again the Bucket
conflict resolution in the BTrees
package.
An important application thereof is ZCatalog
. This application
uses code of the form if not tree: ...persistent_modify...
This use disallows the resolution of a conflict where one the
the conflicting parties has emptied a tree/bucket because this
would make the test not tree
unsafe: the conflict resolution could
make the tree unempty but cannot undo the "persistent_modification".
Keep in mind that any conflict resolution is potentially dangerous. Whether a given strategy is acceptable heavily depends on the applications of the data structure.
This implements a simple conflict resolution strategy for PersistentMapping.