Better POSKeyError handling

jimfulton commented 6 years ago

Dealing with POSKeyErrors is way too painful.

I think there are some things we can do to make this less painful:

When a POSKeyError is encountered when loading object state, include the referring object's id and database id in the error and make sure they're displayed when the error is displayed. Doh!
Add an option to return a broken object without erroring and still log the error.

jmuchemb commented 6 years ago

When a POSKeyError is encountered when loading object state, include the referring object databsee id in the error and make sure they're displayed when the error is displayed.

IIUC, this is only useful when using multiple ZODB in an application.

Add an option to return a broken object without erroring and still log the error.

I don't understand.

jimfulton commented 6 years ago

On Tue, May 29, 2018 at 5:25 AM, Julien Muchembled <notifications@github.com

wrote:

When a POSKeyError is encountered when loading object state, include the referring object databsee id in the error and make sure they're displayed when the error is displayed.

IIUC, this is only useful when using multiple ZODB in an application.

Sorry, that should have read: "the referring object's id and database id." Fixed. The main idea here is to figure out what the referring object is.

Add an option to return a broken object without erroring and still log the error.

I don't understand.

Today, POSKetErrors are very hard to deal with because the error occurs at a very low level. Optionally, allowing a broken object to be used elevates these errors to the application level where it's potentially easier to deal with. We want to log the error even if we return a broken object without raising an application exception.

vpelletier commented 6 years ago

I worked on this issue during the sprint today. Here is how far I got:

POSKeyError is raised when the code is trying to fill an existing ghost object. At this point the appliation already thinks it has an instance (ghost state, being un-ghostified), so it's not as straightforward as the usual BrokenObject use-case (where it's the class which does not exist, so not an internal inconsistency of the database).

Intercepting the POSKeyError exception inside ZODB may not let the application progress much, because the replacement instance needs to make hard decisions. A few examples could be:

the missing object could be a BTree node, so it would need to define how it can be iterated upon (desired outcome would likely be to behave like an empty list: just skip to next bucket)
it could be created because the missing state's object is being evaluated as a boolean in an if statement, so the stand-in object would need to know which branch is "safe" to take when there is no known content for the object

If the replacement instance cannot take such decision, it will likely just raise an exception, which brings us back to square one.

So my feeling is that the POSKerError should propagate the way it already does, and should be handled by the application (data structure class, ...) as appropriate rather than making ZODB level take the decision for the application.

Discussing with @jimfulton , he said this error is typically reported is when packing: while walking the object tree to find orphan nodes (any non-walked node is an orphan object), references to missing objects completely prevent packing, leading to this exception and preventing the pack from finishing.

In the specific case of pack:

container object should be rather easily available (how easy depends on tree walking algorithm used) so it could be included in the error message. Maybe rather than just the immediate container object, I think the whole hierarchy would be interesting to include in the error (knowing the missing object was part of OOBucket with oid 0x1234 likely does not help much).
if the POSKeyError really only occurs when packing, it means the unreachable object is not needed in normal database use: maybe pack tree walker should just catch the error and consider there is no object being referenced, then the whole (original) subtree rooted at the missing object will be discarded, if any. This would be optional, so as to still report corruptions by default, while still allowing to pack.
going further on the idea of ignoring such corruptions and how to allow user to understand and inspect its impact, pack could produce a new file containing all objects which were discarded from the main database, similar to the /lost+found directory on ext2/3/4. The exact data structure to use is yet undefined: what would be the most convenient way to let user browse a collection of disjointed subtrees ? Some would be incomplete internal persistent structures so their class may be unable to work with them (ex: a discarded btree bucket, whose parent node would still be in use). An idea would be to have some container type to play the role of /lost+found, and orphans attached to it following the regular ZODB style. Then tools should be devised to work on these, deciding whether all-but-root should be forcibly broken objects (to ease browsing)

ml31415 commented 2 years ago

I'd vote for better error handling in general. It's not only POSKeyErrors, but also all sorts of zlib.errors or UnpicklingErrors, that might creep into the database over the years and then break packing or object iteration and retrieval.

zopefoundation / ZODB

Better POSKeyError handling #206