simolus3 / drift

Drift is an easy to use, reactive, typesafe persistence library for Dart & Flutter.
https://drift.simonbinder.eu/
MIT License
2.55k stars 365 forks source link

Database locks during bulk/batch insert #245

Open DevonJerothe opened 4 years ago

DevonJerothe commented 4 years ago

When trying to store large amounts of entries the database crashes/locks. Smaller lists work fine however once I begin hitting larger lists (~250,000 objects) the db begins to act up.

Bellow is my call to insert/update the table: (I am running this incrementally when dealing with large lists)

await GetIt.instance<DatabaseService>().loansDao.batchUpdate(
            user.id, _newLoans.sublist(start, end),
            folder: folder);

print('Batch $i / $_batches - total inserted $start / $total of folder: $folder');

And here is my inserting:

Future batchUpdate(
    String userID,
    List<api.Loan> loans, {
    api.LoansFolder folder = api.LoansFolder.all,
  }) async {
    final _items = <LoanRecordsCompanion>[];
    final _updated = DateTime.now();
    for (var loan in loans) {
      final _item = LoanRecordsCompanion(
        loanId: Value(loan.id),
        folder: addField<String>(describeEnum(folder),
            fallback: describeEnum(api.LoansFolder.all)),
        userID: Value(userID),
        firstName: addField<String>(loan?.firstName),
        lastName: addField<String>(loan?.lastName),
        cellPhone: addField<String>(phoneFormat(loan?.cellPhone)),
        homePhone: addField<String>(phoneFormat(loan?.homePhone)),
        officePhone: addField<String>(phoneFormat(loan?.homePhone)),
        email: addField<String>(loan?.email),
        dateClosed: addField<DateTime>(loan?.dateClosed),
        loanMilestone: addField<String>(loan?.loanMilestone),
        loanNumber: addField<String>(loan?.loanNumber),
        loanRate: addField<double>(loan?.loanRate),
        contactID: addField<String>(loan?.contactId),
        lastUpdated: addField<DateTime>(_updated),
      );
      _items.add(_item);
    }
    return transaction(() async {
      await into(loanRecords)
          .insertAll(_items, orReplace: true);
      //await cleanLoans(userID, _updated, folder: folder);
    });
  }

locks occur no matter what I set the limit to per batch. Can this be an OS throttle issue? Last I checked per insert was limited to 1MB size for SQLite.

simolus3 commented 4 years ago

I've never tried to operate on that many rows in moor (250k is really quite a number), but it should be straightforward to reproduce, I'll take a detailed look tomorrow.

locks occur no matter what I set the limit to per batch

That's interesting. So the first batches work but then transactions just don't complete at some point? That definitely shouldn't happen, but I'm not sure what could be causing this. I doubt it has to do with the OS because sqlite should only write to the file after each transaction.

DevonJerothe commented 4 years ago

Yes, the batch will run for a while then noticeably slow down the app until it locks or crashes. Im usually able to get about half the objects into the database before the issues arise, but it's unpredictable.

Ive tested items per batch at the low end of 50 and high end of 50000

DevonJerothe commented 4 years ago

Just to clarify how our system is working. We have 5 data types separated into lists that we are then stored into the db. Each of the types have a Bloc handling the actions to the db. Once the user logs in all the blocs fire batch update, I only batch insert if the list is quit large if not, I insert the whole list. most of these lists are small and insert with no issue until we hit the edge case that one of these lists contains a large set of data.

Im doing more tests to see if maybe the transactions are being called simultaneously and locking then.

UPDATE: Alright so I set up a queue for all the inserts and it works much better. Issue seems to be from simultaneous transactions being called syncronosly as well as inserting more than 100k items in one insert.

flutter: Setting Up Notifications...
flutter: Events: 0
flutter: Notifications Setup...
flutter: Batch 1 / 7 - total inserted 15000 / 102386 of folder: LoansFolder.closed
flutter: Batch 2 / 7 - total inserted 30000 / 102386 of folder: LoansFolder.closed
flutter: Batch 3 / 7 - total inserted 45000 / 102386 of folder: LoansFolder.closed
flutter: Batch 4 / 7 - total inserted 60000 / 102386 of folder: LoansFolder.closed
flutter: Batch 5 / 7 - total inserted 75000 / 102386 of folder: LoansFolder.closed
flutter: Batch 6 / 7 - total inserted 90000 / 102386 of folder: LoansFolder.closed
flutter: Batch 7 / 7 - total inserted 102386 / 102386 of folder: LoansFolder.closed
flutter: Batch 1 / 1 - total inserted 3311 / 3311 of folder: LoansFolder.activeloans
flutter: Batch 1 / 2 - total inserted 15000 / 19838 of folder: LoansFolder.prospects
flutter: Batch 2 / 2 - total inserted 19838 / 19838 of folder: LoansFolder.prospects
flutter: Batch 1 / 1 - total inserted 3311 / 3311 of folder: LoansFolder.mypipeline
flutter: Loans Finished Inserting
flutter: Total contacts: 129
flutter: Contacts Finished Inserting
flutter: Batch 1 / 7 - total inserted 25000 / 166313
flutter: Batch 2 / 7 - total inserted 50000 / 166313
flutter: Batch 3 / 7 - total inserted 75000 / 166313
flutter: Batch 4 / 7 - total inserted 100000 / 166313
flutter: Batch 5 / 7 - total inserted 125000 / 166313
flutter: Batch 6 / 7 - total inserted 150000 / 166313
flutter: Batch 7 / 7 - total inserted 166313 / 166313
flutter: Leads Finished Inserting
flutter: Opportunity Alerts Total: 151673
**flutter: Warning database has been locked for 0:00:10.000000. Make sure you always use the transaction object for database operations during a transaction

UPDATE: 11/21/19 - So seems like I was just lucky yesterday. The db locked and crashed the app while batch inserting. In my debug console its just an endless output of db locked. Can reading and writing cause issues? I have SQL statements for watching each lists stream as we are inserting. The locking seems to be random, I can insert everything fine and the next time it might lock in the first batch or halfway through. Not sure what can be causing this. If you need anymore info, I can provide.

simolus3 commented 4 years ago

Sorry, I was slow to see your updates. I've tried to setup a small repro, but so far I'm unable to hit a state where the db locks up.

I have SQL statements for watching each lists stream as we are inserting

Just to clarify: Are you running selects that span the entire table? Moor and the underlying library, sqflite, process those results as a single list. Each result is also sent across platform channels in its entirety. I could imagine that things start to slow down significantly at some point.

DevonJerothe commented 4 years ago

Hey, so I was actually just about to respond with my findings.

I am using legacy custom queries (planning on moving to .moor) to watch each table, I am selecting based off folders (categories) and the returned list can range anywhere from 0 - inf. I was reading through #254 and limited the queries to 500 each and have not run into any locks. I am still getting UI freezes for the split second that I'm calling each insert tho.

simolus3 commented 4 years ago

Well that's a bit of an improvement, at least :)

I am still getting UI freezes for the split second I'm calling each insert

I'll do some proper performance analysis based on this use case, but the thing is that just sending the insert statements to the database requires quite some work on the ui thread:

To be honest I haven't really digged into the performance of moor for big operations yet. At the scale where I use it, it always felt fast enough. Maybe there's some clear bottleneck that can be optimized. Using moor_ffi can also eliminate most of the second step, but ffi doesn't quite work on Flutter yet.

firatcetiner commented 4 years ago

Inserting 20+ entities in a batch definitely freeze the UI, very badly. Additionally, using streams in order to map the data from db makes it worse, since on each insert moor sending a select request to update the data. Seems like a lot of work.

@simolus3 Do you have any optimization plans for this, or any suggestion while inserting lots of entities together and also listening table entities with a Stream?

simolus3 commented 4 years ago

Inserting 20+ entities in a batch definitely freeze the UI, very badly

Wait, what kind of entities are you inserting? Do they have a huge BLOB column attached to them or anything? 20 entities should be nothing, I've never seen any performance issues on that scale.

since on each insert moor sending a select request to update the data

This is not true when using moor's batch api (or transactions in general). All changes in a transaction are combined an then dispatched when the transaction completes. So inserting 1000 entries in a batch should only update streams once.

firatcetiner commented 4 years ago

Wait, what kind of entities are you inserting? Do they have a huge BLOB column attached to them or anything? 20 entities should be nothing, I've never seen any performance issues on that scale.

Entity has 6 TextColumns, 2 IntColumns and 2 BooleanColumns, some of them are nullable. I'm fetching the list of JSON objects from an API and insert individually, passing an Entity.fromJson(jsonEntry) for each insert operation.

This is not true when using moor's batch api (or transactions in general). All changes in a transaction are combined an then dispatched when the transaction completes. So inserting 1000 entries in a batch should only update streams once.

I am listening all the entities from the databases with appropriate query.watch(). The dispatched stream directly passed in to a StreamBuilder widget. See Figure. You can see for each insert opeation there is a select statement right after all the transactions complete. Am I doing it wrong? What can be the correct way of listening entities?

Edit: I should say more about what I'm doing. There is a Table called Notifications, receiving notifications from the api and inserting is not a problem. Say I want to make change is_read field for all of the notifications, to mark them as read. Casually updating each of the notification will cause moor to sent a select statement on each update operation, can be seen in the figure. I feel like I'm totally wrong about implementing this, since the performance is very bad. Maybe I should try to use transaction() instead of simple update, insert operations.

simolus3 commented 4 years ago

Say I want to make change is_read field for all of the notifications, to mark them as read

If you want to update all of them, you can do that with a single statement: Something like update(notifications).write(NotificationsCompanion(isRead: Value(true))); If you need a subset, using a where clause will likely be more efficient than updating them individually.

Maybe I should try to use transaction() instead of simple update, insert operations.

If the above doesn't work for you, then yes! Wrapping many small updates in a transaction is easy to do and can improve performance a lot.

firatcetiner commented 4 years ago

Say I want to make change is_read field for all of the notifications, to mark them as read

If you want to update all of them, you can do that with a single statement: Something like update(notifications).write(NotificationsCompanion(isRead: Value(true))); If you need a subset, using a where clause will likely be more efficient than updating them individually.

Maybe I should try to use transaction() instead of simple update, insert operations.

If the above doesn't work for you, then yes! Wrapping many small updates in a transaction is easy to do and can improve performance a lot.

Thanks a lot! And thank you for this awesome plugin.

felipecastrosales commented 5 months ago

any news about it?