BTree persisted in ZODB throws KeyError when accessing item clearly in the Btree - maxKey() causes an Access Violation

zopefoundation / BTrees

Other

81 stars 28 forks source link

BTree persisted in ZODB throws KeyError when accessing item clearly in the Btree - maxKey() causes an Access Violation #102

Open matthchr opened 5 years ago

matthchr commented 5 years ago

I have a BTree which I am persisting via ZODB. In this BTree I add and remove items over time. Somehow, I've managed to get a BTree which has a key for a particular item, but raises a KeyError if I try to actually access that item. Whatever happened to this BTree it's been persisted to my ZODB and even closing/reloading from the DB maintains the same behavior.

In the below code-snippets, repro is the BTree in question loaded from ZODB.

print('job-0000000014' in repro)  # prints True
print('job-0000000386' in repro)  # prints False

for item in repro:
    print(item)

shows

...
job-0000000014
...
job-0000000386

Finally, this code raises a KeyError on job-0000000386:

for item in repro:
    print('{} = {}'.format(item, repro[item]))

I stepped through a bit in a debugger and it seems that this item is the last item in the last bucket, and interestingly, I can get the item from the bucket directly -- the following code works fine and returns the object t2.

bucket = repro._firstbucket
    while bucket._next is not None:
        bucket = bucket._next
    t1 = bucket.get('job-0000000281')
    t2 = bucket.get('job-0000000386')

Digging a bit more, it seems that calling repro.maxKey() triggers an access violation: -1073741819 (0xC0000005)

If you need more details about the structure of my BTree with repros the problem I can share them with you (unfortunately I am not able to construct a new one which reproduces this problem deterministically, but I have a copy of one from a ZODB)

d-maurer commented 5 years ago

Is the problem reproducible on the result of pickling and then unpickling your tree? If this is the case, you could attach the pickle of your tree to this issue and we would be able to analyse the tree. Otherwise, you have a way to mostly repair your tree with this approach. Of course, it is possible that your tree can no longer be pickled.

Note that you should also tell which ZODB version you are using.

A BTree consists of two parts: a tree form access structure and a linked list of buckets. Your observation indicates that the access structure is damaged. Fortunately, the entire relevant information is contained in the linked bucket list. You can rebuild the tree from this list: create a new tree, iterate over the bucket list and put their content into the new tree. Thus, you willl be able to recover. Nevertheless, it might be interesting to learn in what way the structure is damaged.

matthchr commented 5 years ago

@d-maurer thanks for the quick response. We're using ZODB 5.5.1, BTrees 4.5.1

Yes, the problem persists through pickling -- even modifying the broken BTree in a variety of ways (such as removing other items and then re-persisting it) doesn't make the issue go away.

I've attached the BTree in question saved into a basic ZODB with the same version mentioned above, located at matthchr.zip -- unzip it and it contains matthchr.db.

You can crack it open like so:

    zodb = ZODB.DB('matthchr.db')
    conn = zodb.open()
    repro = conn.root()['a']

I'm not super interested in repairing this (there's not any critical data in the BTree which I am afraid of losing) -- more in understanding either what we did wrong to cause this, or if there is a bug in ZODB/BTrees which needs to be fixed.

d-maurer commented 5 years ago

Matthew Christopher wrote at 2019-3-15 14:59 -0700:

@d-maurer thanks for the quick response. We're using ZODB 5.5.1, BTrees 4.5.1

I've attached the BTree in question saved into a basic ZODB with the same version mentioned above, located at matthchr.zip -- unzip it and it contains matthchr.db.

You can crack it open like so:
   zodb = ZODB.DB('matthchr.db')
   conn = zodb.open()
   repro = conn.root()['a']
I'm not super interested in repairing this (there's not any critical data in the BTree which I am afraid of losing) -- more in understanding either what we did wrong to cause this, or if there is a bug in ZODB/BTrees which needs to be fixed.

I have found what is wrong with your tree.

From the documentation (--> "Development.txt" in the "BTrees" package):

In a non-empty BTree, every bucket node contains at least one key, and every BTree node contains at least one child and a non-NULL firstbucket pointer. However, a BTree node may not contain any keys.

In your tree, this condition is violated: the last bucket is empty. You can verify this by looking at its state repro.__getstate__(). This is a pair where the first element represents the first level structure of the tree and the second element is the reference to the first bucket. The structure is represented by a tuple of the form "b_0, s_1, b_1, ..., b_n-1, s_n, b_n" where the "b_i" are buckets and the "s_i" separators. "repro.getstate()[0][-1]" gives the empty "BTrees.OOBTree.OOBucket([])".

That you see such an inconsistency likely indicates a bug in the "BTrees" code. The static state does not allow to determine with certainty what this bug could be but as conflict resolution is particularly difficult (and "BTrees" try to resolve conflicts aggressively), a conflict resolution problem could be a candidate.

Is you tree used in an environment with high conflict risk (i.e. many concurrent writes)?

d-maurer commented 5 years ago

Matthew Christopher wrote at 2019-3-15 14:59 -0700:

... We're using ZODB 5.5.1, BTrees 4.5.1

... and python 3.

matthchr commented 5 years ago

The instance we saw this bug may have had concurrent writes. We actually do our best to avoid them via locking but after I saw this I went and looked and realized there's at least once place where we wern't locking properly and could've allowed a concurrent write (add w/ simultaneous delete).

I don't think many would be true though. At the most, 2-3 concurrent writes would've happened, certainly not more than that.

I've seen 5-6 additional repro's of this issue since I started looking for it, I'll make some changes to improve our locking and see if that resolves the problem, as a concurrency issue was what it felt like to me as well...

d-maurer commented 5 years ago

Matthew Christopher wrote at 2019-3-15 17:15 -0700:

... I've seen 5-6 additional repro's of this issue since I started looking for it, I'll make some changes to improve our locking and see if that resolves the problem, as a concurrency issue was what it felt like to me as well...

I have tried to detect weaknesses in the conflict resolution which might lead to an empty bucket -- without success. The current "BTrees" version seems to detect these cases reliably and refuse to resolve the conflict in case of a risk.

Have your broken trees been created with a different "BTrees" version or did they go through a Python 2 --> Python 3 migration?

d-maurer commented 5 years ago

Matthew Christopher wrote at 2019-3-15 17:15 -0700:

... I've seen 5-6 additional repro's of this issue since I started looking for it, I'll make some changes to improve our locking and see if that resolves the problem, as a concurrency issue was what it felt like to me as well...

The inconsistency of your tree is detected by BTrees.check.check:

tree = c.root()["a"] check(tree) Traceback (most recent call last): File "", line 1, in File "/home/dieter/tmp/xmlsec3/python/lib/python3.6/site-packages/BTrees/check.py", line 424, in check Checker(btree).check() File "/home/dieter/tmp/xmlsec3/python/lib/python3.6/site-packages/BTrees/check.py", line 338, in check raise AssertionError(s) AssertionError: Errors found in OOBTree (0xb718d70c oid=b'0xd1'): key 'job-0000000386' >= upper bound 'job-0000000371' at index 1, in OOBucket (0xb71b341c oid=b'0xd8'), path from root 6

This indicates that the empty bucket might not be the primary problem but caused by the key "job-0000000386" being inserted into the wrong bucket.

I have used dm.historical to determine the history of this broken bucket:

from dm.historical import getHistory from pprint import pprint as pp ts = tree.getstate() t1=ts[0] bb = t1[-3] bb BTrees.OOBTree.OOBucket([('job-0000000285', []), ('job-0000000386', [])]) pp(getHistory(bb)) 'obj': BTrees.OOBTree.OOBucket([('job-0000000285', []), ('job-0000000386', [])]), 'size': 88, 'tid': b'\x03\xce`\x1c\xaf/\xcb3', 'time': DateTime('2019/03/15 22:48:41.059381 GMT+1'), 'username': ''}, {'description': '', 'obj': BTrees.OOBTree.OOBucket([('job-0000000285', []), ('job-0000000386', <persistent broken agent.task.data.JobPersistent instance b'\x00\x00\x00\x00\x00\x00\x00\xda'>)]), 'size': 134, 'tid': b'\x03\xce\xe3\xf2\xfa;\xaa', 'time': DateTime('2019/03/15 21:51:56.947845 GMT+1'), 'username': ''}, {'description': '', 'obj': BTrees.OOBTree.OOBucket([('job-0000000281', []), ('job-0000000285', []), ('job-0000000386', <persistent broken agent.task.data.JobPersistent instance b'\x00\x00\x00\x00\x00\x00\x00\xda'>)]), 'size': 158, 'tid': b'\x03\xce\xb73y\xc3\xbb', 'time': DateTime('2019/03/15 21:07:12.064604 GMT+1'), 'user_name': ''}]

Unfortunately, this means that the bucket has already been corrupted when the latest packing has happened ("getHistory" can reconstruct an object's history up to this point).

To find out more, you could periodically run check over your trees in order to detect inconsistencies within a short delay (at least, before you pack the storage).

You could then use dm.historical to determine the history of the broken bucket. This way, you should be able to determine the transaction which has damaged the bucket. You should also be able to determine the previous state[s] for this bucket and the containing tree. Comparing the damaged state with its predecessor(s), we might be able to guess which tree operation(s) has caused the damage.

The "description" field in the history above is always empty. This indicates, that you are not using "transaction metadata" (--> "Transaction.note"). You could use "Transaction.note" to provide information about what the transaction has done on a more global level.

All that information might make it possible to reproduce the damage and then understand why it happens.

d-maurer commented 5 years ago

Matthew Christopher wrote at 2019-3-15 17:15 -0700:

... I've seen 5-6 additional repro's of this issue since I started looking for it, I'll make some changes to improve our locking and see if that resolves the problem, as a concurrency issue was what it felt like to me as well...

The state of your corrupted tree could have been caused by two concurrent inserts where one insert splits a bucket after the other insert has decided to insert into this bucket.

Usually, the ZODB prevents concurrency problems by allowing each thread to use its own private copy of a persistent object and check for potential concurrency problems during commit. In case of a potential concurrency problem, it raises a ConflictError and the potentially conflicting transaction is usually retried.

This approach will not work when the application passes on persistent objects (more precisely their copy destined for the local thread) across threads. In that case, multiple threads can modify the same copy concurrently and easily create inconsistencies. Are you sure that your application does not do this? If multiple threads need access to the same persistent object, each one must get its own local copy by loading it (directly or indirectly) from a local (i.e. thread specific) ZODB connection.

The ZODB allows objects to resolve concurrency problems locally (called "conflict resolution") - and the BTrees buckets make intensive use of this possibility.

Conflict resolution is intrinsically difficult as it has only local knowledge (it knows only a single conflicting objects - and does not know whether there are conflicts in other objects; it does not know in which context the object is used (e.g. whether the bucket forms part of a larger tree); it knows only partially the object's children (for persistent children it knows only that they are persistent and their oid and does not know its "value") and a resolved state should be correct for any conflicting transactions.

Old BTrees versions had a conflict resolution which gave wrong results for some important applications. As far as I know, there are no such problems known for newer versions. I have triggered "insert/split" conflicts and the conflict resolution of the current BTrees version (the one you are currently using) has correctly detected that it cannot handle this case. Thus, apparently, your broken trees should not be the result of a conflict resolution bug.