openfoodfacts / smooth-app

🤳🥫 The new Open Food Facts mobile application for Android and iOS, crafted with Flutter and Dart
https://world.openfoodfacts.org/open-food-facts-mobile-app?utm_source=off&utf_medium=web&utm_campaign=github-repo
Apache License 2.0
843 stars 281 forks source link

Define the scope of offline data #2461

Closed teolemon closed 3 months ago

teolemon commented 2 years ago

What

Per @AshAman999 's computation in #2447

All 2,4M barcodes

My estimates were kinda same here as well(17mb) I thought of,
Besides https://fr.openfoodfacts.org/api/v2/search?fields=code&page_size=100 I

Products (everything including KP, compressed)

Around 7 kb for each product
75Mb for 10k products
750 Mb for 100k products

Images

https://images.openfoodfacts.org/images/products/335/003/014/4593/front_fr.24.100.jpg
https://images.openfoodfacts.org/images/products/335/003/014/4593/front_fr.24.full.jpg

107MB for 10k (front image) https://squoosh.app/editor

Part of

monsieurtanuki commented 2 years ago

@teolemon That would mean extracting those product fields from the server:

That can be implemented in pure SQL with the following tables:

create table offline_product(
    id int autoincrement primary key,
    barcode text unique key not null,
    brands text,
    name text);

create table offline_attribute(
    id int autoincrement primary key,
    text_id text unique key not null);

create table offline_product_attribute(
    product_id int not null,
    attribute_id int not null,
    score real not null,
    primary key (product_id, attribute_id));

Or something more compact, like a dedicated table with all the attributes as columns.

The thing is, that's a good idea to cache tons of products locally, but you'll get very poor performances if you keep json there. What would a typical query be? For the moment we're dumb in Smoothie, we just ask for a barcode and we get the corresponding json product. That's the primary key, fair enough. What's the purpose of the offline database? If it's the same getProductFromBarcode, we can keep json. If it's "get me other products from the same brand / the same category / that suit me better", we need to create other table columns. If we don't, it means that each query will have to json-decode the whole database.

We would be ignoring these ones:

monsieurtanuki commented 2 years ago

I'm about to start a new project called "fast food":

Creating a project aside sounds like a good idea to me:

teolemon commented 2 years ago

Note that @AshAman999 is working on this as part of his Google Summer of Code project: https://wiki.openfoodfacts.org/GSOC_2022_-_Offline_Smoothie

monsieurtanuki commented 2 years ago

@teolemon @AshAman999 Oops, then I stop. I would suggest to do it first in a separate project and to focus on the read-only mode first.

teolemon commented 2 years ago

Per @AshAman999 's computation in #2447

All 2,4M barcodes

My estimates were kinda same here as well(17mb) I thought of,
Besides https://fr.openfoodfacts.org/api/v2/search?fields=code&page_size=100 I

Products (everything including KP, compressed)

Around 7 kb for each product
75Mb for 10k products
750 Mb for 100k products

Images

https://images.openfoodfacts.org/images/products/335/003/014/4593/front_fr.24.100.jpg
https://images.openfoodfacts.org/images/products/335/003/014/4593/front_fr.24.full.jpg

107MB for 10k (front image) https://squoosh.app/editor