tekartik / sqflite

SQLite flutter plugin
BSD 2-Clause "Simplified" License
2.87k stars 522 forks source link

Race condition causing Database.transaction() to pass null to action function #481

Open jgrahn opened 4 years ago

jgrahn commented 4 years ago

If a callback such as onCreate launches a transaction without waiting for it to finish with await, subsequent calls to Database.transaction() may result in null being passed as the argument txn to the action function.

This happens because of a race condition between the member variables openTransaction and transactionRefCount. Specifically, openTransaction is set to null in the cleanup block of doOpen() on line 737 in database_mixin.dart soon after onCreate returns, while transactionRefCount is decremented at the end of the transactions it (may have) launched, which happens inside the callback wrapper _runTransaction() on line 478.

The implementation essentially assumes that the object can be in one of two states:

  1. In the process of opening, in which case a single transaction object is used for all transaction calls, and transactionRefCount is used as a reference counter for it. (The mechanism by which this is achieved is rather confusing, but ostensibly correct.)
  2. Already open, in which case a lock ensures that only one transaction is in flight at any given time.

The issue occurs when a transaction during phase 1 is still in flight during a call to transaction() in phase 2. The variable openTransaction will then be null, causing the call to go through line 327 in txnSynchronized(), which (despite its argument txn) is equivalent to return action(null);. However, _runTransaction() fails to create a new transaction object to pass as txn as it normally would, since transactionRefCount is still non-zero. The lock in txnSynchronized() will not prevent this as transactions started during open do not acquire the lock. A potential solution might be to simply acquire the lock during open, though I have not analysed the consequences of that in any detail.

While it could be argued that callbacks during open (such as onCreate) ought to not launch asynchronous transactions without waiting for them, it is also not obviously the case that it is forbidden. Furthermore, it is a mistake easy enough to make that it would be better if the library at the very least handled the situation more robustly.

See below for a code snippet reproducing the issue, including a resulting stack trace.

import 'package:path/path.dart';
import 'package:sqflite/sqflite.dart';

const _initScripts = [
  '''
  create table foo(
    foo_id text primary key
  ) without rowid;
  ''',
  '''
  create table bar(
    foo_id text not null,
    data text not null,
    primary key(foo_id, data),
    foreign key(foo_id) references foo(foo_id) on delete cascade
  ) without rowid;
  ''',
];

typedef Action = Future<List<Map<String, dynamic>>> Function(Transaction txn);

void triggerBug() async {
  final fdb = getDatabasesPath().then((basePath) => openDatabase(
        join(basePath, "debug.db"),
        version: 1,
        onCreate: (db, version) async {
          db.transaction((txn) async {
            for (var sql in _initScripts) {
              await txn.execute(sql);
            }
          });
        },
      ));
  Sqflite.setDebugModeOn(true);
  Action action = (txn) => txn.rawQuery('''
        select foo_id from foo;
      ''');
  () async {
    var tuples = await fdb.then((db) => db.transaction(action));
    print(tuples);
  }();
}

This yields the following stack trace.

E/flutter ( 1066): [ERROR:flutter/lib/ui/ui_dart_state.cc(166)] Unhandled Exception: NoSuchMethodError: The method 'rawQuery' was called on null.
E/flutter ( 1066): Receiver: null
E/flutter ( 1066): Tried calling: rawQuery("          select bucket_id from bucket;\n        ")
E/flutter ( 1066): #0      Object.noSuchMethod (dart:core-patch/object_patch.dart:51:5)
E/flutter ( 1066): #1      new SQLiteBucketDB.<anonymous closure>.<anonymous closure> (package:tokencounter/state/persistence/sqlite.dart:51:52)
E/flutter ( 1066): #2      SqfliteDatabaseMixin._runTransaction (package:sqflite_common/src/database_mixin.dart:475:28)
E/flutter ( 1066): #3      SqfliteDatabaseMixin.transaction.<anonymous closure> (package:sqflite_common/src/database_mixin.dart:492:14)
E/flutter ( 1066): #4      SqfliteDatabaseMixin.txnSynchronized.<anonymous closure> (package:sqflite_common/src/database_mixin.dart:327:22)
E/flutter ( 1066): #5      BasicLock.synchronized (package:synchronized/src/basic_lock.dart:32:26)
E/flutter ( 1066): #6      SqfliteDatabaseMixin.txnSynchronized (package:sqflite_common/src/database_mixin.dart:323:33)
E/flutter ( 1066): #7      SqfliteDatabaseMixin.txnWriteSynchronized (package:sqflite_common/src/database_mixin.dart:345:7)
E/flutter ( 1066): #8      SqfliteDatabaseMixin.transaction (package:sqflite_common/src/database_mixin.dart:491:12)
E/flutter ( 1066): #9      SQLiteBucketDB._transaction.<anonymous closure> (package:tokencounter/state/persistence/sqlite.dart:61:32)
E/flutter ( 1066): #10     _rootRunUnary (dart:async/zone.dart:1198:47)
E/flutter ( 1066): #11     _CustomZone.runUnary (dart:async/zone.dart:1100:19)
E/flutter ( 1066): #12     _FutureListener.handleValue (dart:async/future_impl.dart:143:18)
E/flutter ( 1066): #13     Future._propagateToListeners.handleValueCallback (dart:async/future_impl.dart:696:45)
E/flutter ( 1066): #14     Future._propagateToListeners (dart:async/future_impl.dart:725:32)
E/flutter ( 1066): #15     Future._completeWithValue (dart:async/future_impl.dart:529:5)
E/flutter ( 1066): #16     _AsyncAwaitCompleter.complete (dart:async-patch/async_patch.dart:40:15)
E/flutter ( 1066): #17     _completeOnAsyncReturn (dart:async-patch/async_patch.dart:311:13)
E/flutter ( 1066): #18     BasicLock.synchronized (package:synchronized/src/basic_lock.dart)
E/flutter ( 1066): <asynchronous suspension>
E/flutter ( 1066): #19     ReentrantLock.synchronized (package:synchronized/src/reentrant_lock.dart:30:17)
E/flutter ( 1066): #20     SqfliteDatabaseFactoryMixin.openDatabase (package:sqflite_common/src/factory_mixin.dart:71:17)
E/flutter ( 1066): #21     openDatabase (package:sqflite/sqflite.dart:152:26)
E/flutter ( 1066): #22     SQLiteBucketDB._openDB.<anonymous closure> (package:tokencounter/state/persistence/sqlite.dart:36:45)
E/flutter ( 1066): #23     _rootRunUnary (dart:async/zone.dart:1198:47)
E/flutter ( 1066): #24     _CustomZone.runUnary (dart:async/zone.dart:1100:19)
E/flutter ( 1066): #25     _FutureListener.handleValue (dart:async/future_impl.dart:143:18)
E/flutter ( 1066): #26     Future._propagateToListeners.handleValueCallback (dart:async/future_impl.dart:696:45)
E/flutter ( 1066): #27     Future._propagateToListeners (dart:async/future_impl.dart:725:32)
E/flutter ( 1066): #28     Future._completeWithValue (dart:async/future_impl.dart:529:5)
E/flutter ( 1066): #29     _AsyncAwaitCompleter.complete (dart:async-patch/async_patch.dart:40:15)
E/flutter ( 1066): #30     _completeOnAsyncReturn (dart:async-patch/async_patch.dart:311:13)
E/flutter ( 1066): #31     SqfliteDatabaseFactoryMixin.getDatabasesPath (package:sqflite_common/src/factory_mixin.dart)
E/flutter ( 1066): <asynchronous suspension>
E/flutter ( 1066): #32     getDatabasesPath (package:sqflite/sqflite.dart:168:54)
E/flutter ( 1066): #33     SQLiteBucketDB._openDB (package:tokencounter/state/persistence/sqlite.dart:36:7)
E/flutter ( 1066): #34     new SQLiteBucketDB (package:tokencounter/state/persistence/sqlite.dart:48:39)
E/flutter ( 1066): #35     new _BucketListPageState (package:tokencounter/main.dart:45:26)
E/flutter ( 1066): #36     BucketListPage.createState (package:tokencounter/main.dart:37:41)
E/flutter ( 1066): #37     new StatefulElement (package:flutter/src/widgets/framework.dart:4635:25)
E/flutter ( 1066): #38     StatefulWidget.createElement (package:flutter/src/widgets/framework.dart:900:38)
E/flutter ( 1066): #39     Element.inflateWidget (package:flutter/src/widgets/framework.dart:3485:40)
E/flutter ( 1066): #40     Element.updateChild (package:flutter/src/widgets/framework.dart:3258:18)
E/flutter ( 1066): #41     SingleChildRenderObjectElement.mount (package:flutter/src/widgets/framework.dart:5874:14)
E/flutter ( 1066): #42     Element.inflateWidget (package:flutter/src/widgets/framework.dart:3490:14)
E/flutter ( 1066): #43     Element.updateChild (package:flutter/src/widgets/framework.dart:3258:18)
E/flutter ( 1066): #44     ComponentElement.performRebuild (package:flutter/src/widgets/framework.dart:
alextekartik commented 4 years ago

Thanks for catching this and for the reproducible case (that I might not take a look before 2 weeks though).

However not awaiting the end of the transaction in onCreate is a developer mistake that should be reported though (here it crashes badly but it should definitely not succeed). adding await in onCreate should fix the issue. Indeed during onCreate we are re-using an existing transaction hence its weird handling (that I will need to clearup at some point) to ensure a correct update or fail and a consistent versioning of the database.

Thanks for in-depth analysis (sorry for having you dive into some historical code...)

jgrahn commented 4 years ago

No worries, the fix (adding an await) is as you say trivial once you've realised where the problem lies. The reason I think sqflite could still handle it better is that the developer mistake in question is easy to overlook, and the symptoms indirect and subtle. Even having automated tests might not detect it reliably, since it is timing related and likely depends on the speed of the device the app is installed on. There is no urgency to address this from my side, though.

Regarding, the logic I found confusing; as a pair of new eyes on this piece of code, it would have been easier to follow what goes on if the two cases (during open vs. after) were more explicitly treated as such in the different functions involved. For example, to understand what goes on in _runTransaction() you need to know that:

As a suggestion, I would scrap transactionRefCount entirely, remove the txn property (which is an alias for openTransaction -- it was probably once meant to have a broader purpose but is currently just obfuscating things), and simply check for openTransaction being non-null as an indicator for whether we are in the open phase or not.

The txn argument could be removed from the numerous layers and functions currently passing it down to _runTransaction(), since _runTransaction() anyway only does one out of two things: creates a new transaction object or uses openTransaction.

You would need to add some lines of code in doOpen() that explicitly ends openTransaction after all the callbacks instead of relying on a nested transaction() call, but that's about it.

I can potentially make a patch with the above changes if you'd like (and assuming I find the time before you do yourself). Not sure what policy you have on contributions?

alextekartik commented 4 years ago

Thanks for the in-depth analysis. I'm kind of vacation so won't tackle this or any review for 1 or 2 weeks unless something is urgent.

As I said some code are historical (the ref cound was used for batch in a transaction and initially transaction were re-entrant using zone and likely some code is not needed any more), that is the life of code.

I can potentially make a patch with the above changes if you'd like (and assuming I find the time before you do yourself). Not sure what policy you have on contributions?

Yes PR are welcome! There is not specify policy besides test your own code. There are some unit tests that are more convenient using ffi (in sqflite_common_test) and a test app to manually run especially when touching a sensitive are such as this one.

The rule "don't fix it is unless broken is always a good rule.". Code is somehow subjective so I don't want to sound rude if you spend some time on something that I might not accept if I don't see any benefit in terms of performance, readability or robustness.

But a pair of fresh eye is always welcome and the analysis you wrote looks at a first glance correct. That is some code that I write 2 years ago so I cannot point right away if what you suggest is correct so patch/pull request are welcome! (running the travis.dart script - mostly running unit tests - will give a first level of quality assessment).

Maybe also the doc could be better regarding the needed await and tracking it in debug with some better assertion (for example testing that the transaction is will valid before/after running a command) could also report a developer mistake in a better way.

Thanks again and sorry for not giving you a better answer at this time!

jgrahn commented 4 years ago

Sure, I absolutely understand what it means to maintain a code base like this. No judgement passed whatsoever. And I can see the traces of "organic evolution" that you mention (also with e.g. the txn property), so I get why it looks like it does. As I spent an hour or so breaking down how it works for myself anyway, I figured I'd share the perspective that comes with it, that's all.

Additionally, I feel you have been responding both promptly and very well here, so ⭐ for that. 🙂