tekartik / sembast.dart

Simple io database
BSD 2-Clause "Simplified" License
763 stars 63 forks source link

Deleted row in the .db file #359

Closed xOldeVx closed 5 months ago

xOldeVx commented 1 year ago

First of all i must say it's the best package i used in Flutter, especially that the package is not based on Schema, it's make the db really flexible!

I noticed the every delete is added to the .db file, i think it's quite unnecessary and can increase the volume of the application especially in a case like mine where I use this action quite a lot, is there a reason that the delete lines go into the db file?

It's looks like this:

{"version":1,"sembast":1}
....
{"key":3,"deleted":true,"store":"my_store"}
{"key":4,"deleted":true,"store":"my_store"}
{"key":5,"deleted":true,"store":"my_store"}
{"key":6,"deleted":true,"store":"my_store"}

And one more thing, is there a benchmark to compare to others dbs?

xOldeVx commented 1 year ago

I'm using put method the replace exists object and i noticed that the .db file adding it as an additional row (with same key!) here's a snippet of my .db file

{"key":"videoId2","store":"lv","value":...}
{"key":"videoId2","store":"lv","value":...}
{"key":"videoId2","store":"lv","value":...}

I see there's no workaround to prevent the increase the .db file size, it's can be a big problem for device's CPU and Memory any idea?

alextekartik commented 1 year ago

Sembast uses an append only format which is explained here: https://github.com/tekartik/sembast.dart/blob/master/sembast/doc/storage_format.md

At some point the data will be compacted to remove the obsolete lines. sembast_sqflite does not have this problem.

There is no dedicated benchmark, I have an old benchmark here https://github.com/tekartik/sembast.dart/blob/master/sembast/doc/perf_io.md that is not really useful. Initial data loading time can be painful for big databases so sembast is not recommended for for all situations. Isar, Hive, ObjectBox, Realm will likely have better performances. I personally consider that database is slow (like firestore for example) and have the application working around this to give the best user experience.

xOldeVx commented 1 year ago

In my case i added 100,000 rows and the file up to ~20mb and my device used RAM up from 221mb to 297mb 😳.

I left Hive to Sembast because Hive doesn't support Lists, and not supporting sorting query, required code generator and based on Object (not flexible enough to me)... Sembast has so many benefits, i think this issue can be a big drawback for any application because weak devices, or even powerful devices can easily cause out of memory exception, or lagging.

I'm thinking about a solution (at least in my side) to fetch all the data sometimes, and drop the db (i see it's empty the .db file), and then insert again all the data to the db, it's can be dangerous solution to lost data.. need to think how to make it more safety.

A better solution should be just to override the current line when using put to replace exist object (like it's behavior in transactions!), I would like to know that a fix like this is coming

alextekartik commented 1 year ago

Indeed sembast might not be the best solution for your 100 000 records case. I'm glad you're trying since that makes a good test but I don't want you to be disappointed if at some point you reach the limits of it.

I think this issue can be a big drawback for any application because weak devices, or even powerful devices can easily cause out of memory exception, or lagging.

You're correct, sembast is not very memory friendly so a lot of records on old devices might not work at all.

A better solution should be just to override the current line when using put to replace exist object (like it's behavior in transactions!), I would like to know that a fix like this is coming

Unfortunately you cannot replace a line in a file without rewriting most the data (i.e. slow). That is the reason of using an append mode only. What you want to do is exactly what is happening during a "compact" operation which you cannot control (but I could expose an entry point for that). Obsolete lines are purged and the whole file is written again. This operation is automatically triggered when some conditions are met (like for example 20% of obsolete lines).

In v3.4.7 I just published you can now explicitely call database.compact(). But a fix like you want will (replace line) will not happen in "sembast io". As I said above sembast_sqlite does not have this problem (but the sqlite db might be bigger anyway).

xOldeVx commented 1 year ago

It's working, the database.compact() is solve the problem, thanks!! 🤩

Indeed sembast might not be the best solution for your 100 000 records case

I think you didn't understand what i mean, think about application with limit to 1000 rows of videos history, if user click on ~200 videos per day (just click on video and it's added to history), when the .db file is only append mode, it's can fetch 100,000 lines after about 16 month.. if there's more items saved in db (and there's), it's can fetch even faster, and it's bad that app with limitation of 1000 records weighs and behavior like 100,000 records

You're correct, sembast is not very memory friendly so a lot of records on old devices might not work at all.

Let me understand you.. because a lot of unused records (like records that deleted or updated, and remaining in the .db file, like my example above), but using about 1,000 records and keep the .db file fit to 1,000 is fine and friendly.. right?

Unfortunately you cannot replace a line in a file without rewriting most the data

Ohh.. of course i understand, thanks!

alextekartik commented 1 year ago

I think you didn't understand what i mean, think about application with limit to 1000 rows of videos history, if user click on ~200 videos per day (just click on video and it's added to history), when the .db file is only append mode, it's can fetch 100,000 lines after about 16 month..

Ok understood, but in this case, if you limit your store to 1000 rows, the file saved should never have more than about 1200 lines (i.e about 20%, compact will happen automatically at some point), even if you write a record 1 millions times (assuming you delete the previous old entries as you add new ones).

but using about 1,000 records and keep the .db file fit to 1,000 is fine and friendly.. right?

Correct. Obsolete lines should not have any memory cost, unless there is a bug somewhere! compacting could be slow though, sembast_sqflite would be a better fit in your scenario (it is just a change in the database factory definition) to avoid too many writes on disk

xOldeVx commented 1 year ago

I will try the sembast_sqflite, thanks you!