tekartik / sqflite

SQLite flutter plugin
BSD 2-Clause "Simplified" License
2.87k stars 521 forks source link

How to handle large inserts (1+ million items) in sqflite? #873

Open raphasauer opened 2 years ago

raphasauer commented 2 years ago

Hello. I am currently working on an application that needs to insert large amounts of data into a local SQLite database using the sqflite plugin. I have attempted the following strategies:

I am using the latest version of Flutter (3.3.3) and the latest version of sqflite (2.1.0). Performance wise, this is how much time the app is taking to perform this operation:

Phone Model Strategy Time (s)
Galaxy S10 Lite Batch 172
Galaxy S10 Lite Raw Insert 513
Galaxy S22 Batch 108
Galaxy S22 Raw Insert 343
Moto G5 Batch 1008
Moto G5 Raw Insert 422

Is there a better way to handle such a large insert?

alextekartik commented 2 years ago

Thanks for the benchmark information!

Ouch that's huge number that I have not tested myself. Sqflite itself has no specific speed optimization other than common sense and I never performed any benchmark (so any data is welcome). Just sending data to insert through flutter services to the plugin has a cost (memory and duration). For batch (that I would personally use), did you set noResult to true in the commit method? This you should speed up a little bit by.

An alternative solution would be to use sqflite_common_ffi (similar API - see here). It might give better result (although I have not compared yet)

There is even a DatabaseFactory (you'll get what it means once you have read the link above) that runs in the same isolate (i.e. to plugin services, not extra thread, see here). Unlike stated, it also works on iOS and Android. So if your query can run in the main thread (i.e. blocking the UI), that it something you can try too.

raphasauer commented 2 years ago

Thanks for the quick response!

We did set the noResult=true in the batch when committing. I am going to benchmark the sqflite_common_ffi and DatabaseFactory methods and update the results in this issue, for further reference.

AndreLuizNogueira commented 1 year ago

I am also trying to improve performance on old devices. Did you finish those tests?

raphasauer commented 1 year ago

@jerckov40 we couldn't modify our app to test the changes suggested by @alextekartik. We solved our issue by switching to the sqlite3 plugin for those inserts; I don't know why there is such a stark difference between the implementations when it comes to insertion time, but we noticed a large improvement with the other library. S10 Lite now inserts +8 million items in under 2 minutes, for instance.

AndreLuizNogueira commented 1 year ago

@jerckov40 we couldn't modify our app to test the changes suggested by @alextekartik. We solved our issue by switching to the sqlite3 plugin for those inserts; I don't know why there is such a stark difference between the implementations when it comes to insertion time, but we noticed a large improvement with the other library. S10 Lite now inserts +8 million items in under 2 minutes, for instance.

do you mean the sqflite_common_ffi library? I already use it on windows. I'll try on mobile

raphasauer commented 1 year ago

do you mean the sqflite_common_ffi library?

No, I mean this lib.

AndreLuizNogueira commented 1 year ago

do you mean the sqflite_common_ffi library?

No, I mean this lib.

I am Trying to use that lib, but on my tests on windows with 13000 inserts it is taking 37038 ms, with the sqflite_common_ffi library it was 1098 ms. can you show some code of how you did it ?

this is the code I tried:

final stmt = db.prepare('INSERT INTO Test (id) VALUES (?)'); for (int cont = 0; cont < 13000; cont++) { stmt.execute([cont]); }

alextekartik commented 1 year ago

Of course nothing can beat sqlite3 library which is a direct link to SQLite shared library using ffi. sqflite_common_ffi is a wrapper around it (so of course slower) and adds:

If you know what you are doing, familiar with SQLite and have strong performance requirement, sqlite3 is the way to go. sqflite is just an opinionated layer on top of various implementation (native plugin, ffi), so it has its downsides. I try to maintain it the best I can (since 2018) and I follow Simon Binder work on sqlite3 as he has provided the best implementation so far.

@jerckov40 you should create a transaction, it will dramatically improve performances here.

AndreLuizNogueira commented 1 year ago

Of course nothing can beat sqlite3 library which is a direct link to SQLite shared library using ffi. sqflite_common_ffi is a wrapper around it (so of course slower) and adds:

  • an API shared with sqflite (helper for open and transactions)
  • running in a separate isolate (i.e. it won't block the UI for extensive statements such as inserting millions of items). So of course this means sharing data with the isolate (slow).

If you know what you are doing, familiar with SQLite and have strong performance requirement, sqlite3 is the way to go. sqflite is just an opinionated layer on top of various implementation (native plugin, ffi), so it has its downsides. I try to maintain it the best I can (since 2018) and I follow Simon Binder work on sqlite3 as he has provided the best implementation so far.

@jerckov40 you should create a transaction, it will dramatically improve performances here.

I am already using batch insert with sqflite_common_ffi. and it was faster then sqlite3. sqflite_common_ffi took 1098 ms while sqlite3 took way more, 37038 ms. that is why i asked for a sample of what he did, I dont believe that I used sqlite3 the right way, it should be faster then sqflite_common_ffi

alextekartik commented 1 year ago

@jerckov40 you should create a transaction, it will dramatically improve performances here.

I dont believe that I used sqlite3 the right way,

Sorry I was not clear. You have to execute explicitly BEGIN TRANSACTION (before) and COMMIT (after) in the code you provide. It should be a lot faster:

db.execute('BEGIN TRANSACTION');
final stmt = db.prepare('INSERT INTO Test (id) VALUES (?)');
for (var cont = 0; cont < 13000; cont++) {
  stmt.execute([cont]);
}
db.execute('COMMIT');
AndreLuizNogueira commented 1 year ago

That Was it. Now sqlite3 took 55 ms. and that was just 1 table, I download multiple 50k row tables from back-end, now it will run all in less then 1 second. Thank you, all my app users will love this update.

isn't it possible to create some new batch function like this on sqflite_common_ffi ?

I will only change my batch download function to sqlite3, every othen screen will still use sqflite_common_ffi due to wrapper conveniences.

alextekartik commented 1 year ago

isn't it possible to create some new batch function like this on sqflite_common_ffi ?

Well..that's actually the purpose of Batch (it spawn a bunch of SQL query in an isolate)

The equivalent code would be:

var batch = db.batch();
for (var cont = 0; cont < 13000; cont++) {
  batch.execute('INSERT INTO Test (id) VALUES (?)', [cont]);
}
await batch.commit(noResult: true);

But yes I know it is still slower (just sending 13000 statement to the isolate has a cost)

If you don't want to use any isolate you can use the databaseFactoryFfiNoIsolate factory which be a faster but raw sqlite3 will give you the best performance.

I will only change my batch download function to sqlite3, every othen screen will still use sqflite_common_ffi due to wrapper conveniences.

I'm not sure I can recommend mixin both as it means accessing the database from multiple isolates so you might have to deal with database locked issue.

Example

Below a complete example comparing 3 solutions (sqflite_common_ffi, sqflite_common_ffi (no isolate), raw sqlite3);

import 'dart:async';

import 'package:sqflite_common_ffi/sqflite_ffi.dart';
import 'package:sqlite3/sqlite3.dart';

Future main() async {
  var createTableStatement =
      'CREATE TABLE IF NOT EXISTS Test (id INTEGER PRIMARY KEY);';
  var clearTableStatement = 'DELETE FROM Test;';
  var insertStatement = 'INSERT INTO Test (id) VALUES (?)';
  var dbPath = 'test_perf.db';
  var count = 130000;
  var db = await databaseFactoryFfi.openDatabase(dbPath);
  await db.execute(createTableStatement);
  await db.execute(clearTableStatement);
  var sw = Stopwatch()..start();
  var batch = db.batch();
  for (var i = 0; i < count; i++) {
    batch.execute(insertStatement, [i]);
  }
  await batch.commit(noResult: true);
  print('${sw.elapsedMilliseconds} ms - sqflite_common_ffi');
  await db.close();

  db = await databaseFactoryFfiNoIsolate.openDatabase(dbPath);
  await db.execute(createTableStatement);
  await db.execute(clearTableStatement);
  sw = Stopwatch()..start();
  batch = db.batch();
  for (var i = 0; i < count; i++) {
    batch.execute(insertStatement, [i]);
  }
  await batch.commit(noResult: true);
  print('${sw.elapsedMilliseconds} ms - sqflite_common_ffi (no isolate)');
  await db.close();

  var rawDb = sqlite3.open(dbPath);
  rawDb.execute(createTableStatement);
  rawDb.execute(clearTableStatement);
  final stmt = rawDb.prepare(insertStatement);

  rawDb.execute('BEGIN TRANSACTION');
  sw = Stopwatch()..start();
  for (var i = 0; i < count; i++) {
    stmt.execute([i]);
  }
  rawDb.execute('COMMIT');
  print('${sw.elapsedMilliseconds} ms - raw sqlite3 ');
  rawDb.dispose();
}

which on my Linux gives:

492 ms - sqflite_common_ffi
364 ms - sqflite_common_ffi (no isolate)
62 ms - raw sqlite3 

As you can see sqlite3 is way faster. Here I am, bashing my own work. I'm open to improvement of course!

AndreLuizNogueira commented 1 year ago

I don't know if it makes any difference, but batch.update/insert receives a "Map<String, Object?> values", while sqlite3 receives "List<Object?> parameters".... do you convert the map to List<Object?> ? if yes, receiving List<Object?> whouldn't be faster ?

also, another thing I'll try is to drop indexes and recreating afer insert, it works on my firebird backend. maybe try to implement this drop/recreate automaticly

alextekartik commented 1 year ago

I don't know if it makes any difference, but batch.update/insert receives a "Map<String, Object?> values", while sqlite3 receives "List<Object?> parameters".

batch.update is an helper where you don't have to write a full typical update request, see batch.rawUpdate for the raw API (which takes a List<Object?> parameters. See the difference between update and rawUpdate for Database API - batch is different because you don't get the result right away but the syntax is similar.

This API is insipred from Android API

ced1check commented 6 months ago

I experienced the same issue when inserting 3K items in a DB table, it took 30 seconds on WSA or an old Android 8 device.

Further testing on WSA, using batch didn't make any difference, but using BEGIN/END TRANSACTION resulted in an insert of less than a second!

FWIW, the same code (without transactions) took less than a second on iOS or on recent Android versions.

I'm glad I found this thread!

AndreLuizNogueira commented 6 months ago

I experienced the same issue when inserting 3K items in a DB table, it took 30 seconds on WSA or an old Android 8 device.

Further testing on WSA, using batch didn't make any difference, but using BEGIN/END TRANSACTION resulted in an insert of less than a second!

FWIW, the same code (without transactions) took less than a second on iOS or on recent Android versions.

I'm glad I found this thread!

I ended up using SQLite3 for my download function, because it's much faster, and sqflite for local queries. Just a warning... If you are going to do this, make sure you don't use both at the same time and reopen sqflite after using sqlite3

ced1check commented 6 months ago

I ended up using SQLite3 for my download function, because it's much faster, and sqflite for local queries. Just a warning... If you are going to do this, make sure you don't use both at the same time and reopen sqflite after using sqlite3

For now in DEBUG mode results are pretty ok, everything loads in split-seconds. However once I go in production, issue is worse on Android.

It appears to be due to the fact that we use a native app, running multiple flutter engines, which most end-up with database is locked. Sometimes it overcomes the issue after several minutes, but most of the time almost no data is loaded.

Not sure moving to sqlite3 will solve that problem.

AndreLuizNogueira commented 6 months ago

I ended up using SQLite3 for my download function, because it's much faster, and sqflite for local queries. Just a warning... If you are going to do this, make sure you don't use both at the same time and reopen sqflite after using sqlite3

For now in DEBUG mode results are pretty ok, everything loads in split-seconds. However once I go in production, issue is worse on Android.

It appears to be due to the fact that we use a native app, running multiple flutter engines, which most end-up with database is locked. Sometimes it overcomes the issue after several minutes, but most of the time almost no data is loaded.

Not sure moving to sqlite3 will solve that problem.

I had these database locking errors so I said don't use both at the same time and reopen sqflite after using sqlite3. In my code I only need the performance to save in 1 function... when I call it the user receives a loading screen, so he cannot open another screen that uses sqflite... in the end I close sqlite3 and reopen sqflite.

On the readme page, there is this under Current Issues:

"Due to the way transaction works in SQLite (threads), concurrent read and write transaction are not supported. All calls are currently synchronized and transactions block are exclusive. I thought that a basic way to support concurrent access is to open a database multiple times but it only works on iOS as Android reuses the same database object. I also thought a native thread could be a potential future solution however on android accessing the database in another thread is blocked while in a transaction..."

but this is just to improve my recording performance. I also made other changes, such as adding an update_time field in my backend and only querying updated after the maximum time I have.

ced1check commented 6 months ago

I had these database locking errors so I said don't use both at the same time and reopen sqflite after using sqlite3. In my code I only need the performance to save in 1 function... when I call it the user receives a loading screen, so he cannot open another screen that uses sqflite... in the end I close sqlite3 and reopen sqflite.

On the readme page, there is this under Current Issues:

"Due to the way transaction works in SQLite (threads), concurrent read and write transaction are not supported. All calls are currently synchronized and transactions block are exclusive. I thought that a basic way to support concurrent access is to open a database multiple times but it only works on iOS as Android reuses the same database object. I also thought a native thread could be a potential future solution however on android accessing the database in another thread is blocked while in a transaction..."

but this is just to improve my recording performance. I also made other changes, such as adding an update_time field in my backend and only querying updated after the maximum time I have.

We also use update_time to load the DB, except during initial loading, which is the only process that causes issues.

However the same flutter modules (5 at this time) running in a pure flutter app works fine on Android and a single flutter module runs fine when used from native on Android as well. It's only when multiple modules are loaded simultaneously that we're experiencing issues on initial load, once loaded everything runs smoothly.

It'll be hard and time-consuming to migrate to sqlite3 entirely, so I'll first try to load modules at intervals to see if it solves this, otherwise I'll definitely migrate the entire DB stack. Supposing the mapping will have to be reimplemented.

AndreLuizNogueira commented 6 months ago

I had these database locking errors so I said don't use both at the same time and reopen sqflite after using sqlite3. In my code I only need the performance to save in 1 function... when I call it the user receives a loading screen, so he cannot open another screen that uses sqflite... in the end I close sqlite3 and reopen sqflite. On the readme page, there is this under Current Issues: "Due to the way transaction works in SQLite (threads), concurrent read and write transaction are not supported. All calls are currently synchronized and transactions block are exclusive. I thought that a basic way to support concurrent access is to open a database multiple times but it only works on iOS as Android reuses the same database object. I also thought a native thread could be a potential future solution however on android accessing the database in another thread is blocked while in a transaction..." but this is just to improve my recording performance. I also made other changes, such as adding an update_time field in my backend and only querying updated after the maximum time I have.

We also use update_time to load the DB, except during initial loading, which is the only process that causes issues.

However the same flutter modules (5 at this time) running in a pure flutter app works fine on Android and a single flutter module runs fine when used from native on Android as well. It's only when multiple modules are loaded simultaneously that we're experiencing issues on initial load, once loaded everything runs smoothly.

It'll be hard and time-consuming to migrate to sqlite3 entirely, so I'll first try to load modules at intervals to see if it solves this, otherwise I'll definitely migrate the entire DB stack. Supposing the mapping will have to be reimplemented.

I believe you can't do this because of this simultaneous transaction problem... I once saw a comment asking how to use 2 applications with 1 database. They said it was possible, but only with a third application controlling the database, and the others accessing it through that third application....

before going to sqlite3 consider this, I tried it and came back because sqlite3 is synchronous, light queries ok but anything heavy enough will block your application for a few seconds. in fact, I asked about it and Simon Binder recommended using drift. I didn't find the post, but that is my next move, after some performance tests.