Open jgrahn opened 4 years ago
Thanks for catching this and for the reproducible case (that I might not take a look before 2 weeks though).
However not awaiting the end of the transaction in onCreate is a developer mistake that should be reported though (here it crashes badly but it should definitely not succeed). adding await in onCreate should fix the issue. Indeed during onCreate we are re-using an existing transaction hence its weird handling (that I will need to clearup at some point) to ensure a correct update or fail and a consistent versioning of the database.
Thanks for in-depth analysis (sorry for having you dive into some historical code...)
No worries, the fix (adding an await
) is as you say trivial once you've realised where the problem lies. The reason I think sqflite could still handle it better is that the developer mistake in question is easy to overlook, and the symptoms indirect and subtle. Even having automated tests might not detect it reliably, since it is timing related and likely depends on the speed of the device the app is installed on. There is no urgency to address this from my side, though.
Regarding, the logic I found confusing; as a pair of new eyes on this piece of code, it would have been easier to follow what goes on if the two cases (during open vs. after) were more explicitly treated as such in the different functions involved. For example, to understand what goes on in _runTransaction()
you need to know that:
transactionRefCount
is only relevant during open, and will not have any effect after open because of its interaction with a lock in txnSynchronized()
ensuring that it stays zero.openTransaction
will be passed as the txn
argument during open (through several layers of indirection).transactionRefCount
will in fact start at 1 and not at 0 during open. This is because the all callbacks during open are themselves wrapped inside another call to transaction()
inside doOpen()
, and we are in fact reentering _runTransaction()
with openTransaction
having been modified between the layers of nesting.As a suggestion, I would scrap transactionRefCount
entirely, remove the txn
property (which is an alias for openTransaction
-- it was probably once meant to have a broader purpose but is currently just obfuscating things), and simply check for openTransaction
being non-null as an indicator for whether we are in the open phase or not.
The txn
argument could be removed from the numerous layers and functions currently passing it down to _runTransaction()
, since _runTransaction()
anyway only does one out of two things: creates a new transaction object or uses openTransaction
.
You would need to add some lines of code in doOpen()
that explicitly ends openTransaction
after all the callbacks instead of relying on a nested transaction()
call, but that's about it.
I can potentially make a patch with the above changes if you'd like (and assuming I find the time before you do yourself). Not sure what policy you have on contributions?
Thanks for the in-depth analysis. I'm kind of vacation so won't tackle this or any review for 1 or 2 weeks unless something is urgent.
As I said some code are historical (the ref cound was used for batch in a transaction and initially transaction were re-entrant using zone and likely some code is not needed any more), that is the life of code.
I can potentially make a patch with the above changes if you'd like (and assuming I find the time before you do yourself). Not sure what policy you have on contributions?
Yes PR are welcome! There is not specify policy besides test your own code. There are some unit tests that are more convenient using ffi (in sqflite_common_test) and a test app to manually run especially when touching a sensitive are such as this one.
The rule "don't fix it is unless broken is always a good rule.". Code is somehow subjective so I don't want to sound rude if you spend some time on something that I might not accept if I don't see any benefit in terms of performance, readability or robustness.
But a pair of fresh eye is always welcome and the analysis you wrote looks at a first glance correct. That is some code that I write 2 years ago so I cannot point right away if what you suggest is correct so patch/pull request are welcome! (running the travis.dart script - mostly running unit tests - will give a first level of quality assessment).
Maybe also the doc could be better regarding the needed await and tracking it in debug with some better assertion (for example testing that the transaction is will valid before/after running a command) could also report a developer mistake in a better way.
Thanks again and sorry for not giving you a better answer at this time!
Sure, I absolutely understand what it means to maintain a code base like this. No judgement passed whatsoever. And I can see the traces of "organic evolution" that you mention (also with e.g. the txn
property), so I get why it looks like it does. As I spent an hour or so breaking down how it works for myself anyway, I figured I'd share the perspective that comes with it, that's all.
Additionally, I feel you have been responding both promptly and very well here, so ⭐ for that. 🙂
If a callback such as
onCreate
launches a transaction without waiting for it to finish withawait
, subsequent calls toDatabase.transaction()
may result innull
being passed as the argumenttxn
to theaction
function.This happens because of a race condition between the member variables
openTransaction
andtransactionRefCount
. Specifically,openTransaction
is set tonull
in the cleanup block ofdoOpen()
on line 737 in database_mixin.dart soon afteronCreate
returns, whiletransactionRefCount
is decremented at the end of the transactions it (may have) launched, which happens inside the callback wrapper_runTransaction()
on line 478.The implementation essentially assumes that the object can be in one of two states:
transactionRefCount
is used as a reference counter for it. (The mechanism by which this is achieved is rather confusing, but ostensibly correct.)The issue occurs when a transaction during phase 1 is still in flight during a call to
transaction()
in phase 2. The variableopenTransaction
will then benull
, causing the call to go through line 327 intxnSynchronized()
, which (despite its argumenttxn
) is equivalent toreturn action(null);
. However,_runTransaction()
fails to create a new transaction object to pass astxn
as it normally would, sincetransactionRefCount
is still non-zero. The lock intxnSynchronized()
will not prevent this as transactions started during open do not acquire the lock. A potential solution might be to simply acquire the lock during open, though I have not analysed the consequences of that in any detail.While it could be argued that callbacks during open (such as
onCreate
) ought to not launch asynchronous transactions without waiting for them, it is also not obviously the case that it is forbidden. Furthermore, it is a mistake easy enough to make that it would be better if the library at the very least handled the situation more robustly.See below for a code snippet reproducing the issue, including a resulting stack trace.
This yields the following stack trace.