Closed monsieurtanuki closed 3 years ago
There are 2 distinct uses:
using the database ase a cache for the product information (that we get from the API on a per product basis, or that we could get from search queries as well, or that we (or the user) could prepopulate). For this storing JSON works very well, we can cache the product data + attributes values.
storing lists of products (the most basic one being a scan history, or a "favorites" list). Here SQLite is perfect, as we just store barcodes + context. Then we can query the Sembast cache for the actual product data. We can have tables for the result (we have 5 instances of product barcode 12345678 with different best before dates), as well as table for transactions (I have eaten 12 times this product in the last month). We have a new project for that we will be starting in the next few weeks. It won't be limited to the app though, we'll need to see how we can sync things between the app, the server, different users etc.)
Regarding the database, the technical document that @PrimaelQuemerais has written for the app has a large section on why we need one and what we want to do with it: https://docs.google.com/document/d/1dOryz657AaAC7feaVMWvzdxB8RQ1Ze8O7PEk1QiJUww/edit#heading=h.u53wgh648h3k
Disclaimer: I'm a former dev Oracle DBA, and I spent years creating, loading and querying relational databases with optimization in mind. It doesn't mean I know everything and I'm always right, it means I can be trusted regarding relational database (e.g. SQLite). I don't know that much about NoSQL.
A typical use of the full product database is to get the product attached to a barcode (or products attached to barcodes).
It's a normal use of a relational database, whose core is the "primary key" (a unique and non null value), and which is optimized for that (through the transparent use of "unique index").
A similar behavior can be found in Sembast
, with a syntax like await store.record(barcode).put(db, product.toJson());
and final Map<String, dynamic> jsonProduct = await store.record(barcode).get(database) as Map<String, dynamic>;
.
The good thing about Sembast is that it supports Map<String, dynamic>
as a variable type, and we are already comfortable with JSON (as we get JSON data from internet, and we will store the same data in the database). In SQLite, basically we would have to encode/decode the JSON to/from String
.
An additional good thing about Sembast is that, as it stores JSON, we may be able to run queries like that:
await store.find(database,
finder: Finder(
filter: Filter.and([
Filter.matches('product_name', 'Wasa'),
Filter.greaterThan('ecoscore', 'b'),
])));
We couldn't do that easily in SQLite, as we would have to scan the whole product database, decode Strings
into JSON and so on...
Let's talk about the second use case: history, favorites and custom lists of products.
Both Sembast and SQLite (and even SharedPreferences
) can handle that.
I would even say that SQLite is not appropriate for a custom list of products that the end user can reorder: if we want to change the order, we need to change all the records' custom order value. In Sembast, we can just store a List
of barcodes, let the end user reorder it, then save in one step the List
(List<dynamic>
is supported by Sembast).
Anyway, I don't think it would make sense to onboard a second database system just for that list management.
It was not my initial choice, but from what I read about Sembast I think it would be perfect for both use cases.
Later, we may find issues with the performances (too much memory involved, too slow queries, too slow app startup), but for the moment I think we can stay with one database system, named Sembast.
Thank you for the detailed review. Performances was indeed one of the concerns we had while evaluating the databases. I did write a benchmark a while ago but sadly can't find the results back. I recall Sembast was a little bit slow for writing big chunks of data in one shot, nevertheless writing can be done asynchronously without impacting the user experience so we decided to stick with it.
The JSON format and the query system were the reasons why we preferred Sembast over something like Hive which performances were superior but without any custom query possible. The main advantage of storing the data in a JSON format is the ability to use the classes and functions from openfoodfacts-dart, querying products from the local database is done in the exact same way as querying products from the API.
Regarding the order of lists: I don't really see the use case for letting users re-order manually the lists in a custom order (as in dragging and dropping a list item at a specific place). But there is a big use case to let them display their lists in different orders (e.g. by date of adding the product to the list, by number of days before the products will perish, by number of instances of the product they have consumed / they currently own, by their food preferences, or something else etc.).
One use case to think of: for a given product, we need to know in which lists it belongs to (if any). e.g. you scan a product, and the app tells you it's already in the list "Products in my fridge", or that you scanned it one month ago.
Those use cases can be implemented in many different ways of course.
@stephanegigandet OK, we won't reorder the lists. I agree with you: those use cases can be implemented with different way of thinking or database system.
@PrimaelQuemerais Performances are already flagged as problematic with Sembast, then.
My main concern with Sembast is this: "The whole document based database resides in a single file and is loaded in memory when opened." (found here). That means the bigger the database, the slower the database opening and the higher the level of memory used. It stinks.
With SQLite, the startup is "almost" immediate : "Whenever a database is opened, the entire schema is scanned and parsed and a parse tree for the schema is held in memory. That means that database connection startup time and initial memory usage is proportional to the size of the schema." (the schema meaning the metadata about the tables) (found here). That means the database size has no impact on the duration of the database opening and the amount of memory used.
From what I read so far, smoothie's only concern is "I give you a barcode, you give me the JSON". Not something more elaborated like Filter.and([Filter.matches('product_name', 'Wasa'), Filter.greaterThan('ecoscore', 'b')])
.
In that case we can use SQLite; the only difference is that we need an extra step with the standard conversion json.decode
/ json.encode
. How much more does it cost?
I'm ready to run a test between Sembast and SQLite:
json.encode(myjson)
)Do you agree with that test, guys?
@monsieurtanuki Just curious, with SQLite, do we have to pre-declare the maximum size of the JSON when the table is created (e.g. something like VARCHAR(4096), or is that handled automagically?
One thing that would be very useful to test as well: what if we have 10k products, 100k products, or even 1M products. e.g. for offline use, we could do something similar to OsmAnd where you can download maps for offline use for the countries you want. We could for instance have downlable archives of the top 10k or 100k most popular products. Or as you suggested, more specific datasets like all vegetarian products or all products sold in a specific store.
@stephanegigandet SQLite does not care about variable types, actually. Could be a number, could be a string (but no specified size). And there's no limit - actually there's one limit: 1 billion bytes. We'll have a problem with VERY fat foods ;)
There shouldn't be a problem with the number of products in SQLite (our main concern will be with the primary key index's size). My first test will be Sembast vs. SQLite, until the first of them crashes. Then, another test until the winner crashes.
PS: funny you mention OsmAnd as I contribute to osmdroid.
Guys, we're lucky. I've got a 6-year-old canary-in-the-coalmine smartphone.
My first test crashed with Sembast: I couldn't reopen my 8K product database. While the SQLite was pretty cool. As initially thought, Sembast asks for too much memory when opening the database because it loads all the data.
I'll give you more details tomorrow, but to me it's a NO GO for Sembast.
Once the functions to store and retrieve a Product from the database are written, Sembast offers little to no advantage over SQLite, except that each time we update the Product class the functions need to be updated aswell, which wasn't the case with Sembast.
The benchmark seems a little bit unfair to me as Sembast would allow to query elements by looking at nested fields in the product, which would not be possible if the data is stored in SQLite in a single field. That being said the chose comes down to which features we actually need. Waiting for the benchmark details :)
More details about the benchmark:
sembast: ^2.4.8+1
), and with SQLite (sqflite: ^1.3.2+1
), on my humble smartphone, in flutter run "profiler" modeResults:
json.decode
step to store as a String
btw)Memory usage with Sembast (I don't have similar screenshot for SQLite, the memory usage is so low it doesn't show)
Thanks for all the tests!
Saving 1000 new products in the database (duration)
- Sembast: always around 7s
- SQLite: always around 12s (there's an extra
json.decode
step to store as aString
btw)
I wonder if those 12 seconds are mostly due to the json.encode. What if we want to insert 10k products (from a downloadable file with the most popular products for instance), where each product data is available as a json string. Could you check how long it would take?
@stephanegigandet I relaunched the same SQLite test: this time loading 1K product took around 8s. And when I used json.encode
only once every 100 products (instead of for every product), loading 1K product took around 5.5 s.
For the details: when I load 1K products I run a 10-time loop of running a transaction
of 100 insert statements.
I think the database insert performances are not as important as the database opening:
Batch
, maybe with a better management of async
transactions and insertsThanks for all the tests. I agree SQLite seems to be the best option for our needs.
@stephanegigandet Great! Let's start with SQLite then. I'll send a PR within a week.
We won't be stuck with SQLite forever anyway: as far as most developers are concerned, they will call a black-box that returns a product stored locally, from its barcode. Regardless of the type of database behind. If for some very good reason we feel appropriate in the future to switch to another type of database, nothing should stop us and it will be transparent for the developers.
More specifically about the new SQLite database, we'll create a "table" in it (that's the relational database terminology), that will contain all the (e.g. downloaded) products.
For obvious reasons, the "primary key" (unique, not null) will be a "column" name barcode
, of type TEXT
.
Another obvious column is the JSON version of the product, maybe called encoded_json
, of type TEXT
(maybe BLOB
, I have to double-check that).
Another interesting column would be the last database insert/upsert timestamp, maybe called last_upsert
, of type INT
.
As we already saw, the extraction of the JSON part is more or less painful.
nutriscore
or lastModified
? It will be much easier to know in advance which fields are interesting, whenever we're filtering the whole database. When we're filtering a small subset it's not important: we'll select all the products anyway, and then apply the filtering/ordering in dart, not in SQLite. Anyway, it will still be possible to say "hey wait a minute, we definitely need to filter on brands, can we add it in the new version?", but it's better if we can avoid the corresponding refresh phase.At this point, I'm more thinking of the SQLite database as a big cache of product data, that we access only as a key (code) / value (json) cache. Whenever we need data for a specific product (that we scan, or that we have in a list like the history of scanned products), we look for the corresponding codes through the database (and through the API if we have some network).
Those lists will probably be reasonably small (less than 1000 products), so if we want to filter them, we can do it outside the database.
If we wanted to support a full offline search / exploration (like the current "category browser" that is in the app), then we could have separate fields for categories etc. but at this point this feature is very experimental and very likely to change. Let's do it through the OFF API first, and if there's really a need to make it available offline (which is less and less of a need as connectivity gets more ubiquitous), we can see that later.
We could consider adding a last access time (so that we can expire entries that are never accessed), but then that would mean updating the entries whenever we access them. Not sure if it's worth it.
A "source" field would be useful, so that we know if the entry was added through a scan, a search, a downloaded / initial database etc. Maybe also a field to indicate whether the product belongs to at least one list.
Ok, I'll stay with my 3 columns then for SQLite table product
. Something like that:
create table product(
barcode TEXT PRIMARY KEY,
encoded_json TEXT NOT NULL,
last_upsert INT NOT NULL
)
And about additional fields like "source", it's not related to a product, it's related to a product list:
I think it is more logical to make a list out of the last one. This leads to a better history for and statistics for example how often a user scanned a certain produkt.
I don't know if this has already been said somewhere, but I wanted to mention that scanning no longer works for me in the current master.
E/flutter (15852): [ERROR:flutter/lib/ui/ui_dart_state.cc(177)] Unhandled Exception: FormatException: Invalid double
E/flutter (15852): null
E/flutter (15852): #0 double.parse (dart:core-patch/double_patch.dart:111:28)
E/flutter (15852): #1 AttributeGroups.fromJson (package:openfoodfacts/model/AttributeGroups.dart:67:27)
E/flutter (15852): #2 _$ProductFromJson (package:openfoodfacts/model/Product.g.dart:71:25)
E/flutter (15852): #3 new Product.fromJson (package:openfoodfacts/model/Product.dart:180:7)
E/flutter (15852): #4 _$ProductResultFromJson (package:openfoodfacts/model/ProductResult.g.dart:16:19)
E/flutter (15852): #5 new ProductResult.fromJson (package:openfoodfacts/model/ProductResult.dart:20:7)
E/flutter (15852): #6 OpenFoodAPIClient.getProduct (package:openfoodfacts/openfoodfacts.dart:143:42)
I don't know if this has already been said somewhere, but I wanted to mention that scanning no longer works for me in the current master.
E/flutter (15852): [ERROR:flutter/lib/ui/ui_dart_state.cc(177)] Unhandled Exception: FormatException: Invalid double E/flutter (15852): null E/flutter (15852): #0 double.parse (dart:core-patch/double_patch.dart:111:28) E/flutter (15852): #1 AttributeGroups.fromJson (package:openfoodfacts/model/AttributeGroups.dart:67:27) E/flutter (15852): #2 _$ProductFromJson (package:openfoodfacts/model/Product.g.dart:71:25) E/flutter (15852): #3 new Product.fromJson (package:openfoodfacts/model/Product.dart:180:7) E/flutter (15852): #4 _$ProductResultFromJson (package:openfoodfacts/model/ProductResult.g.dart:16:19) E/flutter (15852): #5 new ProductResult.fromJson (package:openfoodfacts/model/ProductResult.dart:20:7) E/flutter (15852): #6 OpenFoodAPIClient.getProduct (package:openfoodfacts/openfoodfacts.dart:143:42)
This error comes from the openfoodfacts-dart plugin, will investigate
@PrimaelQuemerais I think it's fixed in https://github.com/openfoodfacts/openfoodfacts-dart/pull/65, where the double match
field is more safely loaded with JsonObject.parseDouble(json[_JSON_TAG_MATCH])
rather that double.parse(attribute['match'].toString())
.
Is that normal that we get null
for match
: that's another story...
Hey, work on your exams instead ;)
Great! I will take a look at the PR. Haha thanks I was just taking a break :smile:
For the moment, user preferences are stored in
SharedPreferences
, which is appropriate, and products are stored in a Sembast database in full_products_database.dart.I'm not 100% convinced by Sembast, my personal taste is rather SQLite (with a bit of JSON in it), but this can be challenged, of course.
My concern is rather what we do with the database. For the moment, not that much: we populate the database when we download products, but we never query the database.
I think it would be interesting to actually use the database: