Closed matthchr closed 3 years ago
With respect to the "lost" buckets, I just did:
r = tree['747b6e77-3fe6-4155-afe6-b217505f4215']
(which was one of the lost keys) and somehow that actually worked - I can't seem to see/find the bucket where this is stored though, but somehow if I look it up by specific index the BTree
is able to get me the value. Not sure if that adds clarity or confuses things further...
Matthew Christopher wrote at 2020-1-10 14:58 -0800:
...
From what I can tell, 0 size buckets in the "middle" of the bucket list is not expected and is what triggers the exception I'm getting. Additionally if you look closely, it seems like a large swath of buckets is lost, because there should be a relatively even distribution of GUIDs as keys (and I see that from first digit 0-4 and first digit 9-f), but first digit 5-8 is missing.
My questions are:
- Why doesn't
BTrees.check.check
notice this?
Likely an omission.
- What could cause this type of corruption?
Your report reminds me of an earlier report. That, too, reported
a BTrees
corruption - involving concurrency.
I suggested to use dm.historical
to find which transaction
introduced the corruption (Zope 2 usually writes sufficient
transaction metadata to get a feeling what BTrees operations could
have been performed). Of course, this requires that the corruption
was introduced after the last (effective) pack time.
@d-maurer - Yes, after the previous issue we had, we significantly improved our locking. All BTree writes are now done under a lock - so it doesn't seem as likely that this is a simple corruption issue due to multi-threading (at least not with writes).
We also did observe that the previous issue, where the BTree would show us a key (in say, for item in tree
) but be unable to access the value of that key when doing tree[key]
seems to have been mitigated by locking.
There are a few possibilities we are wondering about in the same space though:
zodb.pack()
periodically, and that is not currently under a lock. Is it possible that we're introducing some sort of corruption by packing? My read of the pack
documentation suggests that's probably unlikely but it's one avenue I'm investigating.Also, from the previous issue you had suggested:
If multiple threads need access to the same persistent object, each one must get its own local copy by loading it (directly or indirectly) from a local (i.e. thread specific) ZODB connection.
Our application uses async/await
as its primary form of asynchrony - which actually means really there's only one thread in play (the one running the asyncio loop). That thread is the only thread with a Connection
at all (IIRC by default connections are associated to the thread they're opened on) and so I don't think there's any conflict resolution going on at that level.
I will definiately improve our logging to detect this case as well (check
missed it so we'll need some specific tests for this I think). Am I correct in assuming that there should be no empty buckets in the bucket linked-list?
we significantly improved our locking
So you must be using gross-grained cross-process locking? Because writes can occur from any process at any time. That seems very expensive and counter-productive. Or you just have a single process...
Our application uses async/await as its primary form of asynchrony - which actually means really there's only one thread in play (the one running the asyncio loop). That thread is the only thread with a Connection at all (IIRC by default connections are associated to the thread they're opened on) and so I don't think there's any conflict resolution going on at that level.
Sorry, but if the application is sharing Persistent object instances across concurrency contexts it is unbelievable architecturally broken. That is to say, any object instance produced by a Connection
must only ever be used by a single "thread" at a time (as must the Connection). Introducing any chance of doing otherwise is just not dreamed of by ZODB.
Perhaps I've misunderstood, and there are indeed extra guarantees in place in this architecture to ensure that the obvious errors of mixing concurrency and Connection
don't happen. If that's the case, please forgive me.
Matthew Christopher wrote at 2020-1-13 13:59 -0800:
@d-maurer - Yes, after the previous issue we had, we significantly improved our locking. All BTree writes are now done under a lock - so it doesn't seem as likely that this is a simple corruption issue due to multi-threading (at least not with writes).
The ZODB was designed to be used without explicit locking. Thus, if used correctly, locking should not be necessary.
We also did observe that the previous issue, where the BTree would show us a key (in say,
for item in tree
) but be unable to access the value of that key when doingtree[key]
seems to have been mitigated by locking.There are a few possibilities we are wondering about in the same space though:
- We call
zodb.pack()
periodically, and that is not currently under a lock. Is it possible that we're introducing some sort of corruption by packing? My read of thepack
documentation suggests that's probably unlikely but it's one avenue I'm investigating.
Packing should not introduce corruptions. If anything, it may only loose whole transactions and be unable to corrupt the internal state of persistent objects.
- Only writes are holding the lock right now, with the assumption that a write + read happening in parallel wouldn't corrupt the BTree - is that assumption flawed? Should we lock all reads and writes?
As indicated above, no locks should be necessary.
Also, from the previous issue you had suggested:
If multiple threads need access to the same persistent object, each one must get its own local copy by loading it (directly or indirectly) from a local (i.e. thread specific) ZODB connection.
Our application uses
async/await
as its primary form of asynchrony - which actually means really there's only one thread in play (the one running the asyncio loop). That thread is the only thread with aConnection
at all (IIRC by default connections are associated to the thread they're opened on) and so I don't think there's any conflict resolution going on at that level.
What I have formulated above for threads is actually a requirement for concurrent activities controlling a transaction. In the standard setup, a transaction is bound to a controlling thread. In your case, however, your coroutines are likely the concurrent activities controlling the transactions and must fulfill the requirement.
If the requirement is not fulfilled, you may loose important transaction features: as effects of different concurrent activities may get committed together leading to an inconsistent state. Another effect may be that some modifications by one concurrent activity is wiped out by an "abort" from a different concurrent activity and the first current transaction makes further modifications only valid if its former modifications were effective - again leading to inconsistent state.
The inconsistencies mentioned above are application level inconsistencies. I do not think that not fulfilling the requirement can cause data structure level inconsistencies. HOWEVER I have seen (very rare) cases where READING (!) a BTree structure from a foreign thread has caused a SIGSEGV: apparently, some (internal) "BTree" operations are performed without holding the GIL and interfering reads can be desastrous. However, coroutine concurrency should not introduce such behaviour (as it does not interrupt internal BTree operations).
@jamadden asked:
So you must be using gross-grained cross-process locking? Because writes can occur from any process at any time. That seems very expensive and counter-productive. Or you just have a single process...
There's only a single process, a single asyncio event loop, and a single active Connection
. Locks are just in-memory locks.
Sorry, but if the application is sharing Persistent object instances across concurrency contexts it is unbelievable architecturally broken. That is to say, any object instance produced by a Connection must only ever be used by a single "thread" at a time (as must the Connection). Introducing any chance of doing otherwise is just not dreamed of by ZODB.
asyncio only provides cooperative yielding/scheduling. Are you saying that the following is not allowed :
async def test_async(zodb_connection):
await some_async_operation()
zodb_connection.root['a'].persistent_thing.x = 1
transaction.commit()
async def test_async_2(zodb_connection):
await some_async_operation()
zodb_connection.root['a'].persistent_thing.x = 2
transaction.commit()
# Example usage
await test_async(conn)
...
await test_async_2(conn)
That is to say, any object instance produced by a Connection must only ever be used by a single "thread" at a time (as must the Connection).
Provided when you say "Thread" here you mean Python thread (and not some lower level primitive or logical concept), we aren't violating this - asyncio also only ever runs coroutines on a single Python thread. Interruptions can only occur when an await
is reached and the interrupting coroutine is also run on the same thread.
@d-maurer says:
If the requirement is not fulfilled, you may loose important transaction features: as effects of different concurrent activities may get committed together leading to an inconsistent state. Another effect may be that some modifications by one concurrent activity is wiped out by an "abort" from a different concurrent activity and the first current transaction makes further modifications only valid if its former modifications were effective - again leading to inconsistent state.
These are good points - as you say though these are application level inconsistencies which we manage:
abort
currentlytransaction.commit()
in the coroutine which made the modificationawait
's between the modification of a persistent object and a transaction being committed (it's done within a with
block)Taking a step back - the entire point of doing what we're doing is basically because opening/closing Connection
s frequently we observed to be expensive. It's not tenable for our application to open a fresh connection for every single coroutine where we may make a modification due to the overall performance impact of doing so - we open a connection, do many transactions/modifications and then close the connection (which triggers persistence to disk I believe) at a defined time.
My understanding was that this pattern, while strange, is not fundamentally against the architecture of ZODB/BTrees - is that not correct?
Matthew Christopher wrote at 2020-1-14 09:08 -0800:
... Taking a step back - the entire point of doing what we're doing is basically because opening/closing
Connection
s frequently we observed to be expensive.
I am using the ZODB in the context of Zope. In this context, a (at least one, sonmetimes several) connection is opened/closed for each request. I have not observed that this causes a significant overhead for request processing.
Note that "ZODB.DB.DB" uses a connection pool to make opening
new connections efficient (the primary purpose is to reuse
the cache associated with the connection; the open
itself is
fast anyway). You should see significant degradations only
when you open more concurrent connections than the pool size.
... My understanding was that this pattern, while strange, is not fundamentally against the architecture of ZODB/BTrees - is that not correct?
Reading persistent objects, too, may (or may not) be important for application level consistency as explained below.
For a transactional system, the notion of "transaction isolation" is important. This "measures" the degree of transaction independence. The best is "serial": with this isolation, the execution of a set of transactions either fails or has the same effect as some serial (rather than concurrent) execution of the transactions.
The ZODB implements a weaker transaction isolation: so called "snapshot isolation". In this model, a transaction sees the state when the transaction started. Conflicts introduced by concurrent modifications to individual objects are recognized during commit and cause some affected transaction[s] to be aborted.
If you have two coroutines C1 and C2 with:
C1: ... ... read persistent object ... ... ... context switch ... ... ... read persistent object ...
and
C2: ... ... write persistent object ... ... commit ... context switch ...
then "C1" may read inconsistent data. The first "read" reads state before the commit; the second after the commit.
Such inconsistent reads may or may not be a problem for your application. In no case should they be responsible for the physical corruption you have observed.
Recently, I have fixed a BTree
inconsistency in a system of a client.
The system has been operational for about 10 years and this has been
the first detected inconsistency. I tend to think that this might
have been the effect of less than complete reliability: computers
and storage media are highly reliable but nevertheless,
very rarely, errors may happen. If such an error affects
persistent data, its effect will remain...
Maybe, you, too, see the effect of a random error?
At your place, I would fix the corrupt BTree
and start worrying
only should a similar problem reoccur in a short to medium timeframe.
We have finally been able to get a somewhat reliable repro of this issue... appreciate your leaving this issue open so long even with no activity.
We've found a way to generate load in our application which produces this issue somewhat consistently. I have shared a ZODB file with the full transaction history available (no packs have been done). I followed the advice given in #102 by @d-maurer and used dm.historical
to look through the transaction history for the BTree in question.
Here's what I found (caveat that I am not an expert here so I was learning as I went):
dm.historical
) was at 2020/08/19 23:38:10.622771 and had _p_estimated_size: 32192
dm.historical) was at 2020/08/19 23:38:12.419679 and had
_p_estimated_size: 32000`b'\x00\x00\x00\x00\x00\x00\r\x91'
I took a deeper look at the difference in history between the last good history entry and the first bad one and what I observed was that we removed keys from 4 buckets between the two entries. Specifically these keys:
bucket 118 differs
1 keys only in lhs {'testaccount1batchtest18 22F1F921291938C1$abs-2-job406589010-2 22F1E3E288AA12F8$job-1$abs-2-job406589010-2-task-63$0:662512081'}
bucket 119 differs
1 keys only in lhs ({'testaccount1batchtest18 22F1F921291938C1$abs-2-job406589010-2 22F1E3E288AA12F8$job-1$abs-2-job406589010-2-task-67$0:662512081'})
bucket 159 (this is the bucket that becomes corrupt) differs
1 keys only in lhs ({'testaccount1batchtest18 22F1F921291938C1$abs-2-job406589010-4 22F1E3E288A8B2FD$job-1$abs-2-job406589010-4-task-205$0:662512051'})
bucket 160 differs
2 keys only in lhs ({'testaccount1batchtest18 22F1F921291938C1$abs-2-job406589010-4 22F1E3E288A8B2FD$job-1$abs-2-job406589010-4-task-235$0:662512051', 'testaccount1batchtest18 22F1F921291938C1$abs-2-job406589010-4 22F1E3E288A8B2FD$job-1$abs-2-job406589010-4-task-234$0:662512051'})
The net of this is that basically we removed 5 keys from 4 buckets, with the key in bucket 159 being the last key. That bucket (oid b'\x00\x00\x00\x00\x00\x00\r\x91'
) didn't disappear like it was (I think) supposed to.
The ZODB containing the BTree in question is in this zip.
There are multiple BTrees in the DB: The one with the issue is: btree = conn.root()['agent.task.history'].history
I've posted the code I used to gather this data here.
Here are the details of our application and the situations that this reproduces in:
I have tried to produce a synthetic test that reproduces this issue outside of the context of our application (no PyInstaller, etc), but I haven't been able to do so. I am hoping that somebody can take a look at the btree in question and maybe something jumps out at you about what specifically is going on with this transaction that is suspicious. With that information maybe I could write a test that reproduced the problem without all of the context/cruft that our application brings, since right now the only way I have to reproduce this is a long and somewhat cumbersome process that involves running our entire application for a while, driving particular load at it, and hoping the issue reproduces.
I also just now realized that @jamadden never actually clarified his usage of the word Thread w.r.t the comments previously - it could be that our application is unbelievably architecturally broken, if that's the case I'd like to know!
Matthew Christopher wrote at 2020-8-26 15:30 -0700:
We have finally been able to get a somewhat reliable repro of this issue... appreciate your leaving this issue open so long even with no activity. ...
- The application is entirely using Python async/await, so as discussed above there is some concurrency in the application but no true parallelism.
This is potentially problematic -- unless you ensure that each coroutine uses its own ZODB connection and the transactions are coroutine specific.
The transaction manager used by ZODB connections by default are thread, not coroutine specific. Thus, you would need to pass a special transaction manager when you open your ZODB connections.
Failure to ensure those may lead to inconsistent persistent data. I am not sure whether it can lead to a physically damanage tree.
Matthew Christopher wrote at 2020-8-26 15:31 -0700:
I also just now realized that @jamadden never actually clarified his usage of the word Thread w.r.t the comments previously - it could be that our application is unbelievably architecturally broken, if that's the case I'd like to know!
"thread" has the narrow meaning of a "Python thread"
(see the Python documentation for its modules _thread
and threading
).
The coroutines associated with "async/await" are not threads
in this sense.
By default, the ZODB targets a multi-thread application and makes assumptions about how threads are using ZODB connections:
each thread must use its own ZODB connection -- a single ZODB connection (and its objects) cannot be shared by different threads
a transaction must be controlled by an individual thread -- i.e. the commit in one thread should not cause the commit of modifications performed in a different thread.
The coroutines associated with "async/await" are similar to threads. However, they, too, must satisfy the conditions formulated above for threads if they want to use the ZODB safely. Especially, they must use a special transaction manager which binds the transactions to coroutines rather than threads.
Matthew Christopher wrote at 2020-8-26 15:30 -0700:
... I took a deeper look at the difference in history between the last good history entry and the first bad one and what I observed was that we removed keys from 4 buckets between the two entries. ... The net of this is that basically we removed 5 keys from 4 buckets, with the key in bucket 159 being the last key. That bucket (oid
b'\x00\x00\x00\x00\x00\x00\r\x91'
) didn't disappear like it was (I think) supposed to.
You are right. The bucket should have been removed from the tree.
Your problem may have a stronger relation to #68
as originally assumed:
even for the good tree (which still can be iterated over)
btree._check()
reports AssertionError: Bucket next pointer is damaged
.
Thus, your observed problem (bucket not removed) might be
a consequence of an earlier bug (damaged bucket chain).
The ZODB containing the BTree in question is in this zip. There are multiple BTrees in the DB: The one with the issue is:
btree = conn.root()['agent.task.history'].history
I've posted the code I used to gather this data here.
For those who want to join the analysis, I used the following code:
from pprint import pprint as pp
from ZODB.DB import DB
from ZODB.FileStorage import FileStorage
s=FileStorage("agent.db")
db=DB(s)
c=db.open()
o=c.root()['agent.task.history']
o._p_activate()
t=o.__Broken_state__["history"]
from dm.historical import *
b=c[b'\x00\x00\x00\x00\x00\x00\r\x91']
bad = getObjectAt(t, b'\x03\xda\x0bj4\xfd\x99\xee') # no longer iterable
good = getObjectAt(t, b'\x03\xda\x0bj4\xfd\x99\xe0') # still iterable
from BTrees.check import *
top=crack_btree(bad, True)
bb_p = top[2][157] # predecessor of bad bucket
bb = top[2][158] # bad bucket
good._check()
... From what I can tell, 0 size buckets in the "middle" of the bucket list is not expected and is what triggers the exception I'm getting. ... My questions are:
1. Why doesn't `BTrees.check.check` notice this?
This is checked (among others) by the _check
method of btrees, not by the check
function.
Dieter Maurer wrote at 2020-8-28 08:42 +0200:
Matthew Christopher wrote at 2020-8-26 15:30 -0700:
... Your problem may have a stronger relation to #68 as originally assumed: even for the good tree (which still can be iterated over)
btree._check()
reportsAssertionError: Bucket next pointer is damaged
. Thus, your observed problem (bucket not removed) might be a consequence of an earlier bug (damaged bucket chain).
I have analysed this further - and indeed, the "bucket not removed" is likely a direct consequence of the damaged bucket chain: almost surely, a different bucket got removed (the one in the chain at the position of the emptied bucket).
I used the following code (following code previously posted):
gt=crack_btree(good, True)
for i in range(len(gt[1])):
if gt[2][i].__getstate__()[1] is not gt[2][i+1]: print(i); break
The result was 157
; this is the position before the emptied bucket
which was not removed (as it should have been).
Dieter Maurer wrote at 2020-8-28 09:28 +0200:
Dieter Maurer wrote at 2020-8-28 08:42 +0200:
Matthew Christopher wrote at 2020-8-26 15:30 -0700:
... Your problem may have a stronger relation to #68 as originally assumed: even for the good tree (which still can be iterated over)
btree._check()
reportsAssertionError: Bucket next pointer is damaged
. Thus, your observed problem (bucket not removed) might be a consequence of an earlier bug (damaged bucket chain).I have analysed this further - and indeed, the "bucket not removed" is likely a direct consequence of the damaged bucket chain: almost surely, a different bucket got removed (the one in the chain at the position of the emptied bucket).
I used the following code (following code previously posted):
gt=crack_btree(good, True) for i in range(len(gt[1])): if gt[2][i].__getstate__()[1] is not gt[2][i+1]: print(i); break
The result was
157
; this is the position before the emptied bucket which was not removed (as it should have been).
The "damaged bucket chain" was introduced by the following transaction:
{'description': '',
'obj': <BTrees.OOBTree.OOBTree object at 0x7f8b3665e158 oid 0x8 in <Connection at 7f8b366cdeb8>>,
'size': 2961,
'tid': b'\x03\xda\x0bi 2Wf',
'time': DateTime('2020/08/20 01:37:7.546089 GMT+2'),
'user_name': ''}
This is history record "5186" in the reversed history
obtained from dm.historical.generateBTreeHistory(t)
.
The previous transaction (where the bucket chain is still okay) is:
{'description': '',
'obj': <BTrees.OOBTree.OOBTree object at 0x7f8b3665e1e0 oid 0x8 in <Connection at 7f8b366cdf60>>,
'size': 2661,
'tid': b'\x03\xda\x0bi\x02\xa9\xce\xbb',
'time': DateTime('2020/08/20 01:37:0.624213 GMT+2'),
'user_name': ''}
In the bad transaction, the bucket following the bucket with
the bad nextbucket
pointer was emptied, the nextbucket
pointer
of the previous bucket was correctly updated (and now points
to the bucket following the emptied bucket) BUT
the emptied bucket was not removed from the parent btree.
I have looked at the code: reponsible is
BTreeTemplate.c:_BTree_set
. While the bucket chain update
is quite far away from the data
update, I have not seen
how it could be possible to "forget" the data
update.
I will see whether I can reproduce the behavior on my linux platform.
I will see whether I can reproduce the behavior on my linux platform.
I was unable to emulate the application specific modules/classes -- a requirement for the reproduction on a foreign system. @matthchr can you provide such an emulation (or even the true code)?
The following code checks deletion of a bucket:
from BTrees.OOBTree import OOBTree, OOBucket
from copy import deepcopy
b2 = OOBucket(dict.fromkeys(range(20, 22)))
b1 = OOBucket()
b1.__setstate__(((10, None), b2))
b0 = OOBucket()
b0.__setstate__(((0, None), b1))
t = OOBTree()
t.__setstate__(((b0, 10, b1, 20, b2), b0))
t._check()
tc = deepcopy(t)
del tc[10]
tc._check()
No "damaged bucket chain" -- at least not under Linux.
@d-maurer thanks for the in-depth investigation. I hadn't realized that _check
did something different than BTrees.check.check
One thing I noticed when walking through the history myself is that my dm.historical
is showing me far fewer history entries than you seem to be seeing. You mentioned seeing 5k+, but I only see ~500. What version of dm.historical
are you using? I'm using version 2.0.1
, and I think that this version is missing certain history entries. I wrote a simple test where I just commited 2000 transactions (each labeled with a number), and while the first 30 or 40 it had each one of, after that it seemed to skip a bunch between each history entry (i.e. one entry would be txn 67, the next 93).
No "damaged bucket chain" -- at least not under Linux.
I ran the code snippet you shared on my Windows environment and didn't see any issue either.
I was unable to emulate the application specific modules/classes -- a requirement for the reproduction on a foreign system. @matthchr can you provide such an emulation (or even the true code)?
I haven't been able to produce a paired down example which can emulate this problem, and unfortunately I can't share the actual code in question.
Before we get too far, we should revisit this:
This is potentially problematic -- unless you ensure that each coroutine uses its own ZODB connection and the transactions are coroutine specific.
The transaction manager used by ZODB connections by default are thread, not coroutine specific. Thus, you would need to pass a special transaction manager when you open your ZODB connections.
Failure to ensure those may lead to inconsistent persistent data. I am not sure whether it can lead to a physically damanage tree.
We are not using a coroutine specific transaction manager when we open connections - I was (mistakenly?) under the assumption that since we weren't actually multi-threaded, we could just use the default transaction manager. As you mentioned, that defaults to a transaction per-thread - since we only have one thread there is at most a single transaction ongoing at a given time.
Can you explain a bit more about the mechanics of how this is problematic? I am failing to understand what the difference is in these two pieces of code from the perspective of ZODB/BTrees:
async def test(btree):
await t1(btree)
await t2(btree)
await t3(btree)
transaction.commit()
async def t1(btree):
btree['foo'] = 5
async def t2(btree):
btree['bar'] = 6
async def t3(btree):
btree['baz'] = 7
def test2(btree):
btree['foo'] = 5
btree['bar'] = 6
btree['baz'] = 7
transaction.commit()
Obviously this example is overly simplistic because it assumes that execution in an async context is always well-ordered, which it's not. Even with a more complicated case where we're reacting to some outside event (a web request or something), such that there may be multiple calls ongoing and order is not guaranteed, I still am not seeing how that's any different than making a bunch of sequential changes and commits in a non-async context.
One issue (which I do understand), is that if we do something like this:
def a():
# modify some btree state (part 1)
await <something>
# modify some more btree state (part 2)
transaction.commit()
and have the expectation that both part 1 and part 2 will always be committed together, that is not correct (because we awaited in the middle and something else could have committed during that await). I don't understand how this could corrupt the btree though (because as far as ZODB/BTrees is concerned we've just done a bunch of sequential commits).
Can you clarify if you think that our lack of coroutine-specific transactions is the cause of this corruption, or if you are just cautioning against that because of possible logic errors?
Matthew Christopher wrote at 2020-8-28 11:37 -0700:
@d-maurer thanks for the in-depth investigation. I hadn't realized that
_check
did something different thanBTrees.check.check
One thing I noticed when walking through the history myself is that my
dm.historical
is showing me far fewer history entries than you seem to be seeing. You mentioned seeing 5k+, but I only see ~500.
I used generateBTreeHistory
(which contained a Python 3 incompatibility --
now fixed with new version 2.0.2).
Note that each persistent object has its own history - independent that
of persistent subobjects. getHistory
(and generateHistory
)
accesses this history. generateBTreeHistory
tries to take
the history of all persistent BTree
and Bucket
subobjects into
account (but it, too, can miss some history records).
What version of
dm.historical
are you using?
2.0.2 (released today).
I'm using version
2.0.1
, and I think that this version is missing certain history entries. I wrote a simple test where I just commited 2000 transactions (each labeled with a number), and while the first 30 or 40 it had each one of, after that it seemed to skip a bunch between each history entry (i.e. one entry would be txn 67, the next 93).
When you made your test with a BTree
, then initially there is
a single persistent object - and getHistory
gives you the full
history. When the tree become larger, then it grows persistent subobjects.
Modifications to those subobjects (alone) are not reflected in
the history of the tree - only modifications of the (top level) tree
structure show in the trees history.
generateBTreeHistory
is there to take modifications of sub tree
structures into account.
No "damaged bucket chain" -- at least not under Linux.
I ran the code snippet you shared on my Windows environment and didn't see any issue either.
I feared that: BTrees
comes with extensive tests
which are run on Windows as well. Almost surely, there
is a test which tests bucket deletion.
I was unable to emulate the application specific modules/classes -- a requirement for the reproduction on a foreign system. @matthchr can you provide such an emulation (or even the true code)?
I haven't been able to produce a paired down example which can emulate this problem, and unfortunately I can't share the actual code in question.
I do not need the real code - just the relevant module/class structure. Below is the emulation I have tried:
from types import ModuleType
m = ModuleType("agent.task.history")
m.task = m.history = m.defs = m
from sys import modules
modules[m.__name__] = modules["agent"] = modules["agent.task"] = modules["agent.task.defs"] = m
from persistent import Persistent
class TaskHistoryInfo(Persistent): pass
TaskHistoryInfo.__module__ = "agent.task.history"
m.TaskHistoryInfo = TaskHistoryInfo
class TaskHistoryData(Persistent): pass
TaskHistoryData.__module__ = "agent.task.history"
m.TaskHistoryData = TaskHistoryData
class TaskKind(object): pass
TaskKind.__module__ = "agent.task.history"
m.TaskKind = TaskKind
It is unfortunately not sufficient: trying to copy the last
correct tree gives an object() takes no arguments
. Likely, the
base class of one of the above classes is wrong.
Before we get too far, we should revisit this:
This is potentially problematic -- unless you ensure that each coroutine uses its own ZODB connection and the transactions are coroutine specific.
Bugs of this kind should not damage the physical structure of the tree. In your case, the physical structure is damaged: for some (still unknown) reason the bucket chain was updated but not the actual tree. Whatever you do (wrong) at the application level, you should not be able to achieve this.
The code can skip the tree update -- but only in case of errors and those should result in an exception. Maybe, under some rare cases (triggered by your tree) an error occurs which (wrongfully) is not transformed into an exception. Debugging will be necessary to find this out.
... Can you explain a bit more about the mechanics of how this is problematic?
An important ZODB aim is to make multi-thread applications (fairly) safe without the need of explicit locking. It uses two techniques for this: "transaction"s and local copies.
Each connection has (logically) its own copies of the objects in the ZODB. Modifications affect the copies, not the persistent data directly. Only when the transaction is committed, modifications are transfered from the local copies to the persistent data.
When several threads share the same transaction, then the transaction commit or abort stores or rolls back not only the modifications of a single thread but of different threads - and usually, the thread issuing the transaction control does not know about the state of other threads. As a consequence, modifications may be persisted (or reverted) in an uncontrolled way.
When a single connection is shared by several threads, they operate on the same local copies and can interfere with one another.
What I wrote about threads above applies also to coroutines (or other concurrent activities).
I am failing to understand what the difference is in these two pieces of code from the perspective of ZODB/BTrees:
async def test(btree): await t1(btree) await t2(btree) await t3(btree) transaction.commit() async def t1(btree): btree['foo'] = 5 async def t2(btree): btree['bar'] = 6 async def t3(btree): btree['baz'] = 7
def test2(btree): btree['foo'] = 5 btree['bar'] = 6 btree['baz'] = 7 transaction.commit()
Obviously this example is overly simplistic because it assumes that execution in an async context is always well-ordered, which it's not. Even with a more complicated case where we're reacting to some outside event (a web request or something), such that there may be multiple calls ongoing and order is not guaranteed, I still am not seeing how that's any different than making a bunch of sequential changes and commits in a non-async context.
Lets see:
Assume, the "await t2(btree)" causes a coroutine switch and the
foreign coroutine calls transaction.abort()
. Then the effect
of t1(btree)
will be undone - a thing completely unexpected by task
.
The ZODB's aim it to ensure that in a single thread/coroutine you can forget that there are other threads/coroutines. If the transactions are not thread/coroutine specific, this is obviously no longer the case.
One issue (which I do understand), is that if we do something like this:
def a(): # modify some btree state (part 1) await <something> # modify some more btree state (part 2) transaction.commit()
and have the expectation that both part 1 and part 2 will always be committed together, that is not correct (because we awaited in the middle and something else could have committed during that await). I don't understand how this could corrupt the btree though (because as far as ZODB/BTrees is concerned we've just done a bunch of sequential commits).
It should not be possible to corrupt the physical btree structure (unless a coroutinee switch would happen as part of object destruction -- I believe that this is impossible).
Can you clarify if you think that our lack of coroutine-specific transactions is the cause of this corruption,
I do not think so.
Not having coroutine specific transactions (or share connections between coroutines) may give you inconsistencies at the application level. It should not be able to cause physical damage to the btree structure itself (because there should be no coroutine switch during a single btree operation only between btree operations).
Matthew Christopher wrote at 2020-8-28 11:37 -0700:
... Can you clarify if you think that our lack of coroutine-specific transactions is the cause of this corruption, or if you are just cautioning against that because of possible logic errors?
You issue is almost surely related to #68 -- and that almost surely has been observed in a standard multi threaded application.
What you do is not right -- but it is not the cause of the observed damage.
because there should be no coroutine switch during a single btree operation only between btree operations
That's not out of the realm of possibility, though, is it? If the objects you're using as keys wind up switching in their __cmp__
(__eq__
, __lt__
) methods, then what's supposedly sequential C code, protected by the GIL, is not. Your point about object destruction is also a good one: A __del__
method implemented in Python can do just about anything, so a line of code like btree['a'] = object()
could have whatever side-effect that btree['a'].__del__
does; that might include switching. I wonder if we're looking at a case of re-entrancy (a btree method switching out, part way through, and then being called again by another operation).
(I'm far more familiar with gevent than with async/await. This is one place that the explicitness of async/await comes in handy. It's probably tough to arrange for switches in those methods without some trouble and several layers. )
You issue is almost surely related to #68 -- and that almost surely has been observed in a standard multi threaded application.
In fact, it's actually a gevent-based application, but it's using almost exclusively the standard-library as monkey-patched by gevent, and greenlets are treated just like threads: one Connection and transaction per request per greenlet. Persistent objects are never shared between "threads" of control without a great deal of care and following some rules (rule #1: Don't do that!) At least they're not supposed to be, so after this conversation I'm wondering if that might have been the root cause of #68.
@d-maurer Thanks for your explanation - I understand now.
@jamadden AFAIK it's not possible to actually yield control from a coroutine without an await
, and I am not aware of a way to use that keyword in a method such as __eq__
, __lt__
, etc. Certainly we aren't doing that.
What you do is not right -- but it is not the cause of the observed damage.
Thanks for the clarification - I will look at making changes to use a connection/txn per-coroutine (I agree that there are clear flaws with what we're doing now, although we're managing the application inconsistencies that may arise from it), but good to hear that it shouldn't be causing corruption.
I do not need the real code - just the relevant module/class structure.
In that case, I may be able to help. Can you share the full snippet of code that you're trying to execute (with the mock object hierarchy + copy + usage of dm.historical?) If I get that I can probably fix it up and share it back with you if I can get it working
AFAIK it's not possible to actually yield control from a coroutine without an await, and I am not aware of a way to use that keyword in a method such as eq, lt, etc. Certainly we aren't doing that.
Yeah, it wouldn't be directly done, it would probably be something buried under layers of function calls. This simple example shows control going to different coroutines while inside of an __eq__
method. If that happened during a BTree operation, and one of the switched-to coroutines was also using the BTree, I'd expect problems.
import asyncio
from BTrees import family64
async def coro2():
print("In coro 2")
print("Len of bt", len(bt))
async def thing():
print("-> Enter coro 1")
await coro2()
print("<- Exit coro 1")
def do_thing():
asyncio.run(thing())
class Foo:
def __eq__(self, other):
print("In __eq__")
import traceback; traceback.print_stack()
do_thing()
return True
__lt__ = __eq__
bt = family64.OO.BTree()
bt[Foo()] = 1
bt[Foo()] = 2
$python /tmp/atest.py
In __eq__
File "/tmp/atest.py", line 28, in <module>
bt[Foo()] = 2
File "/tmp/atest.py", line 20, in __eq__
import traceback; traceback.print_stack()
-> Enter coro 1
In coro 2
Len of bt 1
<- Exit coro 1
$ PURE_PYTHON=1 python /tmp/atest.py
In __eq__
File "/tmp/atest.py", line 28, in <module>
bt[Foo()] = 2
File "//lib/python3.8/site-packages/BTrees/_base.py", line 819, in __setitem__
self._set(self._to_key(key), self._to_value(value))
File "//lib/python3.8/site-packages/BTrees/_base.py", line 949, in _set
result = child._set(key, value, ifunset)
File "//lib/python3.8/site-packages/BTrees/_base.py", line 350, in _set
index = self._search(key)
File "//lib/python3.8/site-packages/BTrees/_base.py", line 125, in _search
if k is key or k == key:
File "/tmp/atest.py", line 20, in __eq__
import traceback; traceback.print_stack()
-> Enter coro 1
In coro 2
Len of bt 1
<- Exit coro 1
Matthew Christopher wrote at 2020-8-28 15:18 -0700:
...
I do not need the real code - just the relevant module/class structure.
In that case, I may be able to help. Can you share the full snippet of code that you're trying to execute (with the mock object hierarchy + copy + usage of dm.historical?) If I get that I can probably fix it up and share it back with you if I can get it working
The idea is to copy the (historical) last good tree into
a current connection and then perform the deletion.
If this deletion reproduces the problem, it can be debugged.
Below is the trial code up to the copying.
You can check whether an updated emulation is sufficient
by executing the code outside your normal environment
(to ensure that not the real agent
code is used)
and verify that there is no exception.
from pprint import pprint as pp from ZODB.DB import DB from ZODB.FileStorage import FileStorage from transaction import commit
from types import ModuleType m = ModuleType("agent.task.history") m.task = m.history = m.defs = m from sys import modules modules[m.name] = modules["agent"] = modules["agent.task"] = modules["agent.task.defs"] = m from persistent import Persistent class TaskHistoryInfo(Persistent): pass
TaskHistoryInfo.module = "agent.task.history" m.TaskHistoryInfo = TaskHistoryInfo
class TaskHistoryData(Persistent): pass
TaskHistoryData.module = "agent.task.history" m.TaskHistoryData = TaskHistoryData
class TaskKind(object): pass
TaskKind.module = "agent.task.history" m.TaskKind = TaskKind
s=FileStorage("agent.db") db=DB(s) c=db.open() t=c.root()['agent.task.history'].history from dm.historical import * th=generateBTreeHistory(t) h = list(th); h.reverse() good = h[5185]["obj"]; bad = h[5186]["obj"]
from tempfile import TemporaryFile with TemporaryFile() as f: good._p_jar.exportFile(good._p_oid, f) f.seek(0) gc = c.importFile(f)
commit()
Jason Madden wrote at 2020-8-28 16:08 -0700:
AFAIK it's not possible to actually yield control from a coroutine without an await, and I am not aware of a way to use that keyword in a method such as eq, lt, etc. Certainly we aren't doing that.
Yeah, it wouldn't be directly done, it would probably be something buried under layers of function calls. This simple example shows control going to different coroutines while inside of an
__eq__
method.
I do not think that anyone will run the event loop nested in
a special method. As for "normal" (async
) coroutines, they need to be
specially compiled. I believe that this excludes to be called/awaited
indirectly from a C library not specially equipped to support this
(such as BTrees
): it would mean that the C runtime stack needs
to be saved and later restored.
Jason Madden wrote at 2020-8-28 14:23 -0700:
because there should be no coroutine switch during a single btree operation only between btree operations
That's not out of the realm of possibility, though, is it?
My current working hypothesis:
_Btree_set
hits an error between the bucket chain
update and the children (aka data
) update. The error
causes the children update to be skipped. For some
(unknown) reason, the error does not cause an exception.
The error might be special for the asynchronous execution environment (to explain why the problem has not popped up previously).
We know that Python suppresses some exceptions (e.g.
exceptions in __del__
); maybe our observation is related
to exception suppression.
Matthew Christopher wrote at 2020-8-28 15:18 -0700:
... In that case, I may be able to help.
Rethinking, it may be better that you try to reproduce the error -- the big advantage: you can use the real code (no need for an emulation).
Following the code posted previously (without the (incomplete) emulation code), you copy the last good tree. Comparing the last good and first bad tree, you determine which key you must delete in the tree copy to (hopefully) reproduce the damage.
I've fixed up your emulation code I believe. The main issue was that TaskKind
is an enum.Enum
:
Full code that ran without error for me:
from ZODB.DB import DB
from ZODB.FileStorage import FileStorage
from transaction import commit
import enum
from types import ModuleType
m = ModuleType("agent.task.history")
m.task = m.history = m.defs = m
from sys import modules
modules[m.__name__] = modules["agent"] = modules["agent.task"] = modules["agent.task.defs"] = m
from persistent import Persistent
class TaskHistoryInfo(Persistent): pass
TaskHistoryInfo.__module__ = "agent.task.history"
m.TaskHistoryInfo = TaskHistoryInfo
class TaskHistoryData(Persistent): pass
TaskHistoryData.__module__ = "agent.task.history"
m.TaskHistoryData = TaskHistoryData
class TaskKind(enum.Enum):
A = 0
B = 1
C = 2
D = 3
TaskKind.__module__ = "agent.task.history"
m.TaskKind = TaskKind
s=FileStorage("agent.db")
db=DB(s)
c=db.open()
t=c.root()['agent.task.history'].history
from dm.historical import *
th=generateBTreeHistory(t)
h = list(th); h.reverse()
good = h[5185]["obj"]; bad = h[5186]["obj"]
from tempfile import TemporaryFile
with TemporaryFile() as f:
good._p_jar.exportFile(good._p_oid, f)
f.seek(0)
gc = c.importFile(f)
commit()
I tried investigating how you suggested (diffing the two trees to see what changed). I used the following function:
def compare_btrees(lhs, rhs):
lhs_keys = {key for key in lhs}
rhs_keys = {key for key in rhs}
keys_only_in_lhs = lhs_keys - rhs_keys
keys_only_in_rhs = rhs_keys - lhs_keys
if any(keys_only_in_lhs):
print('{} keys only in lhs ({})'.format(
len(keys_only_in_lhs), keys_only_in_lhs))
if any(keys_only_in_rhs):
print('{} keys only in rhs ({})'.format(
len(keys_only_in_rhs), keys_only_in_rhs))
From what I can tell, The 5185 -> 5186 BTree diff consisted of the following change:
1 keys only in lhs ({'testaccount1batchtest18 22F1F921291938C1$abs-2-job406589010-4 22F1E3E288A8B2FD$job-1$abs-2-job406589010-4-task-182$0:662512005'})
I tried doing the following which didn't seem to trigger any errors:
with TemporaryFile() as f:
good._p_jar.exportFile(good._p_oid, f)
f.seek(0)
gc = c.importFile(f)
gc._check()
gc.pop('testaccount1batchtest18 22F1F921291938C1$abs-2-job406589010-4 22F1E3E288A8B2FD$job-1$abs-2-job406589010-4-task-182$0:662512005')
gc._check()
I tried on my Linux box though - I will try on my Windows box too and see if there is any difference.
Matthew Christopher wrote at 2020-9-3 17:01 -0700:
... I tried doing the following which didn't seem to trigger any errors:
with TemporaryFile() as f: good._p_jar.exportFile(good._p_oid, f) f.seek(0) gc = c.importFile(f) gc._check() gc.pop('testaccount1batchtest18 22F1F921291938C1$abs-2-job406589010-4 22F1E3E288A8B2FD$job-1$abs-2-job406589010-4-task-182$0:662512005') gc._check()
I tried on my Linux box though - I will try on my Windows box too and see if there is any difference.
When you try on Windows, please use the real code (not the emulation): it might be that peculiarites of the real code trigger the error (those would likely be related to destructors).
When you cannot reproduce the problem than the error does not
depend only on the data; something dynamic must be involved as well --
maybe a gc
run or a context switch (even though I do not see how
the latter could happen).
I tried on my Windows machine and actually I am getting an error here (I believe on the list(th)
)
th=generateBTreeHistory(t)
h = list(th); h.reverse()
The error is:
C:\work\venv-test\Scripts\python.exe C:/work/src/temp2.py
Traceback (most recent call last):
File "C:\work\venv-test\lib\site-packages\dm\historical\__init__.py", line 60, in generateHistory
if not history: raise StopIteration()
StopIteration
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:/work/src/temp2.py", line 53, in <module>
h = list(th); h.reverse()
File "C:\work\venv-test\lib\site-packages\dm\historical\__init__.py", line 129, in generateBTreeHistory
if a.next() is None: heappop(heap)
File "C:\work\venv-test\lib\site-packages\dm\historical\__init__.py", line 100, in next
try: value = next(self.gen)
RuntimeError: generator raised StopIteration
Any idea why that would be?
Matthew Christopher wrote at 2020-9-4 11:56 -0700:
I tried on my Windows machine and actually I am getting an error here (I believe on the
list(th)
)th=generateBTreeHistory(t) h = list(th); h.reverse()
The error is:
C:\work\venv-test\Scripts\python.exe C:/work/src/temp2.py Traceback (most recent call last): File "C:\work\venv-test\lib\site-packages\dm\historical\__init__.py", line 60, in generateHistory if not history: raise StopIteration() StopIteration The above exception was the direct cause of the following exception: Traceback (most recent call last): File "C:/work/src/temp2.py", line 53, in <module> h = list(th); h.reverse() File "C:\work\venv-test\lib\site-packages\dm\historical\__init__.py", line 129, in generateBTreeHistory if a.next() is None: heappop(heap) File "C:\work\venv-test\lib\site-packages\dm\historical\__init__.py", line 100, in next try: value = next(self.gen) RuntimeError: generator raised StopIteration
Any idea why that would be?
An incompatibility of dm.historical
with newer Python versions.
The error does not occur with Python 3.6, it occurs with Python 3.9 (and may have been introduced with a Python version in between).
I have released dm.historical==2.0.3
which fixes this incompatibility.
Any idea why that would be?
An incompatibility of
dm.historical
with newer Python versions. The error does not occur with Python 3.6, it occurs with Python 3.9 (and may have been introduced with a Python version in between).
Python 3.7 adopted PEP 479 "meaning that StopIteration exceptions raised directly or indirectly in coroutines and generators are transformed into RuntimeError exceptions."
@jamadden has it -- I was on 3.6 on my Linux but 3.7 on Windows.
Even with dm.historical==2.0.3
I am still getting an error with 3.7 - @d-maurer are you sure you've fixed it in 2.0.3?
Traceback (most recent call last):
File "C:\work\venv\lib\site-packages\dm\historical\__init__.py", line 131, in generateBTreeHistory
raise StopIteration()
StopIteration
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:/work/temp2.py", line 53, in <module>
h = list(th); h.reverse()
RuntimeError: generator raised StopIteration
Process finished with exit code 1
I reverted to Python 3.6 and everything passed in Windows as well, so it looks like probably no reliable repro... That doesn't necessarily surprise me as this issue has been really hard for us to pin down as well (our first thoughts were things like threading issues, etc)...
Are there next steps that we could take to try to pin down what's going on here?
Matthew Christopher wrote at 2020-9-4 14:52 -0700:
@jamadden has it -- I was on 3.6 on my Linux but 3.7 on Windows.
Even with
dm.historical==2.0.3
I am still getting an error with 3.7 - @d-maurer are you sure you've fixed it in 2.0.3?
There has been another occurrence of the same problem. Fixed in 2.0.4.
... I reverted to Python 3.6 and everything passed in Windows as well, so it looks like probably no reliable repro... That doesn't necessarily surprise me as this issue has been really hard for us to pin down as well (our first thoughts were things like threading issues, etc)...
Are there next steps that we could take to try to pin down what's going on here?
Have you made the Windows trial with the emulation or your real
code? In the first case, you could retry with the real code.
The hypothesis: maybe one of the destructors (they can be executed
inside the BTrees
C code) does something strange (e.g. access the
tree currently under modification).
If this cannot reproduce the problem, then it seems to be truely dynamic -- which makes the analysis far more difficult.
An analysis approach could come from the following thoughts:
your bucket deletion (implemented by _BTree_set
in BTreeTemplate.c
)
consists of two phases: 1. removal from the bucket chain (line 917),
Have you made the Windows trial with the emulation or your real code?
I tried both -- our real code doesn't define any special destructors (but I tried it anyway). I also tried 3.7 with dm.historical 2.0.4 (just to confirm that somehow the updated version of Python wasn't causing anything). For that also there was no repro.
One could run the application inside a debugger and put a breakpoint on the error exit and then try to reproduce the problem. When it occurs, hopefully, the breakpoint is hit and one can (again hopefully) determine what caused the error.
I'm investigating two angles right now:
For 1, is there a document or instructions on performing this process? It would have to be in Windows (the only environment I can get a repro). I am somewhat familiar with windbg, but have never debugged a native Python module loaded by Python before. How can I get the symbols for the native code - are they included in the .whl
for Windows? (I don't think I saw them but maybe missed when I was looking before).
Matthew Christopher wrote at 2020-9-15 17:16 -0700:
... I'm investigating two angles right now:
- Running the application under a debugger and placing some breakpoints when I get a repro.
- Coming up with a more minimal repro that can be shared.
For 1, is there a document or instructions on performing this process?
This depends on the debugger used. I have used gdb
in the past
for similar purposes. For gdb
, their is a set of commands
which facilitates the analysis of Python objects from C
level
(e.g. print the Python stack trace, a Python object, etc.).
BTrees
debugging is particularly difficult due to the extensive use
of (C
preprocessor) macros. I tackled this problem is the past
by running the preprocessor stand alone, remove the line information
introduced by the preprocessor and use the result as new source file.
This ensures that you can put a breakpoint at a precise place
(your debugger might not need this kind of preprocessing).
It would have to be in Windows (the only environment I can get a repro). I am somewhat familiar with windbg, but have never debugged a native Python module loaded by Python before. How can I get the symbols for the native code - are they included in the
.whl
for Windows?
This is quite straight forward with gdb
(I do not know windbg
).
Even with optimizations active, Python is typically compiled to
retain debugging symbols (should this not be the case, you need
to compile Python yourself). Extensions, such as BTrees
,
typically are compiled as Python was compiled. Thuse, they, too,
should have debugging symbols.
With gdb
(under Linux), I could:
start Python under the debugger (or even "attach" a running Python process -- this nowadays requires special OS level configuration under Linux)
import the C extension (debugging symbols are made available at this point, if the extension has them)
use Ctrl-C to enter gdb
install the necessary breakpoints
"continue" to return to Python
run the Python code
We were able to get a minimal repro of this issue and determine why we were only seeing the issue in Windows. You can run the repro for yourself, I've posted the code on GitHub here. The problem also repros in Linux.
The key details (this is basically contained in the readme of that repo as well):
Because there is no issue with the native version of BTrees we've been able to work around this problem by ensuring that we're using the native version of BTrees.
Matthew Christopher wrote at 2020-11-6 10:37 -0800:
We were able to get a minimal repro of this issue and determine why we were only seeing the issue in Windows. You can run the repro for yourself, I've posted the code on GitHub here.
Congratulations!
I was able to isolate (and quickly reproduce) the problem. The code below depends on your agent.db
.
from pprint import pprint as pp
from ZODB.DB import DB
from ZODB.FileStorage import FileStorage
import enum
from types import ModuleType
m = ModuleType("agent.task.history")
m.task = m.history = m.defs = m
from sys import modules
modules[m.__name__] = modules["agent"] = modules["agent.task"] = modules["agent.task.defs"] = m
from persistent import Persistent
class TaskHistoryInfo(Persistent): pass
TaskHistoryInfo.__module__ = "agent.task.history"
m.TaskHistoryInfo = TaskHistoryInfo
class TaskHistoryData(Persistent): pass
TaskHistoryData.__module__ = "agent.task.history"
m.TaskHistoryData = TaskHistoryData
class TaskKind(enum.Enum):
A = 0
B = 1
C = 2
D = 3
TaskKind.__module__ = "agent.task.history"
m.TaskKind = TaskKind
s=FileStorage("agent.db")
db=DB(s)
c=db.open()
t=c.root()['agent.task.history'].history
from dm.historical import *
from transaction import commit
from tempfile import TemporaryFile
bad = getObjectAt(t, b'\x03\xda\x0bi 2Wf')
good = getObjectAt(t, b'\x03\xda\x0bi\x02\xa9\xce\xbb')
key = 'testaccount1batchtest18 22F1F921291938C1$abs-2-job406589010-4 22F1E3E288A8B2FD$job-1$abs-2-job406589010-4-task-182$0:662512005'
with TemporaryFile() as f:
good._p_jar.exportFile(good._p_oid, f)
f.seek(0)
gc = c.importFile(f)
c.root()["good"] = gc; commit()
gc._check()
del gc[key]; gc._check() # still good
commit()
gc._p_deactivate(); gc._check() # bad
This means that the problem is not caused by the deletion operation itself. Instead, the tree structure gets broken via storing/reloading -- almost surely due to a missing _p_changed=True
.
And indeed: there are 2 missing _p_changed = True
:
--- _base.py~ 2020-03-11 09:12:06.746867179 +0100
+++ _base.py 2020-11-06 23:56:57.799589569 +0100
@@ -100,6 +100,7 @@
def _deleteNextBucket(self):
next = self._next
if next is not None:
+ self._p_changed = True
self._next = next._next
def _search(self, key):
@@ -1034,6 +1035,7 @@
self._firstbucket = child._next
removed_first_bucket = True
del data[index]
+ self._p_changed = True
return removed_first_bucket, value
With these additions, the problem disappears.
The first _p_changed
is actually not necessary (as Bucket
derives from Persistent).
Versions: Python 3.6
I have a
BTrees.OOBTree.BTree
in a ZODB which seems to have somehow become corrupted.While our system was running we observed that the data the BTree was returning seemed suspicious (while iterating the tree, items which should have been there were not actually there). We took a snapshot of the tree and manually examined it, and determined that the structure of the BTree seems to have become corrupted.
When we try to iterate the tree (or its keys):
We get this error:
Interestingly, as far as I can tell our live service was not getting this exception (but it definitely was also not returning all of the data it should have been) - I'm not sure why the discrepancy between what our live service was seeing with this DB versus what I see when I download it and look through it locally.
BTrees.check.check
shows no issues.Obviously, I am not modifying the tree while iterating over it in my debugging environment - but when I look deeper into the structure of the tree by iterating the buckets one by one I see this:
From what I can tell, 0 size buckets in the "middle" of the bucket list is not expected and is what triggers the exception I'm getting. Additionally if you look closely, it seems like a large swath of buckets is lost, because there should be a relatively even distribution of GUIDs as keys (and I see that from first digit 0-4 and first digit 9-f), but first digit 5-8 is missing.
My questions are:
BTrees.check.check
notice this?It's possible that this is related to #68, although they look sufficiently different to me that I figured I would post a new issue. It is also possible that this only started happening in
BTrees==4.6.0
as we moved to that version relatively recently (we were onBTrees==4.5.1
previously) - but given how rare this is occurring we may just not have noticed it until now...