Define the scope of offline data

teolemon commented 2 years ago

What

On the old iOS version, we stored: product name, brand, nutriscore, nova, ecoscore for every product we have in the db.
Define the scope of offline data for the new version, keeping in mind that we should probably let the user decide, with a sane default

Per @AshAman999 's computation in #2447

All 2,4M barcodes

My estimates were kinda same here as well(17mb) I thought of,
Besides https://fr.openfoodfacts.org/api/v2/search?fields=code&page_size=100 I

[ ] Create a server side solution to dump all barcodes

Products (everything including KP, compressed)

Around 7 kb for each product
75Mb for 10k products
750 Mb for 100k products

Images

https://images.openfoodfacts.org/images/products/335/003/014/4593/front_fr.24.100.jpg
https://images.openfoodfacts.org/images/products/335/003/014/4593/front_fr.24.full.jpg

107MB for 10k (front image) https://squoosh.app/editor

Part of

2446

monsieurtanuki commented 2 years ago

@teolemon That would mean extracting those product fields from the server:

[x] NAME
[x] BRANDS
[x] BARCODE
[x] ATTRIBUTE_GROUPS

That can be implemented in pure SQL with the following tables:

create table offline_product(
    id int autoincrement primary key,
    barcode text unique key not null,
    brands text,
    name text);

create table offline_attribute(
    id int autoincrement primary key,
    text_id text unique key not null);

create table offline_product_attribute(
    product_id int not null,
    attribute_id int not null,
    score real not null,
    primary key (product_id, attribute_id));

Or something more compact, like a dedicated table with all the attributes as columns.

The thing is, that's a good idea to cache tons of products locally, but you'll get very poor performances if you keep json there. What would a typical query be? For the moment we're dumb in Smoothie, we just ask for a barcode and we get the corresponding json product. That's the primary key, fair enough. What's the purpose of the offline database? If it's the same getProductFromBarcode, we can keep json. If it's "get me other products from the same brand / the same category / that suit me better", we need to create other table columns. If we don't, it means that each query will have to json-decode the whole database.

We would be ignoring these ones:

[ ] NUTRISCORE (duplicated with ATTRIBUTE_GROUPS)
[ ] FRONT_IMAGE
[ ] IMAGE_FRONT_SMALL_URL
[ ] IMAGE_FRONT_URL
[ ] IMAGE_INGREDIENTS_URL
[ ] IMAGE_NUTRITION_URL
[ ] IMAGE_PACKAGING_URL
[ ] SELECTED_IMAGE
[ ] QUANTITY
[ ] SERVING_SIZE
[ ] STORES
[ ] PACKAGING_QUANTITY
[ ] PACKAGING
[ ] PACKAGING_TAGS
[ ] PACKAGING_TEXT_IN_LANGUAGES
[ ] PACKAGING_TEXT_ALL_LANGUAGES
[ ] NO_NUTRITION_DATA
[ ] NUTRIMENTS
[ ] NUTRIENT_LEVELS
[ ] NUTRIMENT_ENERGY_UNIT
[ ] ADDITIVES
[ ] INGREDIENTS_ANALYSIS_TAGS
[ ] INGREDIENTS_TEXT
[ ] LABELS_TAGS
[ ] LABELS_TAGS_IN_LANGUAGES
[ ] ENVIRONMENT_IMPACT_LEVELS
[ ] COMPARED_TO_CATEGORY
[ ] CATEGORIES_TAGS
[ ] CATEGORIES_TAGS_IN_LANGUAGES
[ ] LANGUAGE
[ ] STATES_TAGS
[ ] ECOSCORE_DATA
[ ] ECOSCORE_GRADE
[ ] ECOSCORE_SCORE
[ ] KNOWLEDGE_PANELS
[ ] COUNTRIES
[ ] COUNTRIES_TAGS
[ ] COUNTRIES_TAGS_IN_LANGUAGES
[ ] EMB_CODES

monsieurtanuki commented 2 years ago

I'm about to start a new project called "fast food":

experimental flutter project
access to offline food data in read-only mode
the most simple UI, no camera, no barcode scan
the most compact SQL database and the best performances

Creating a project aside sounds like a good idea to me:

no interferences with the rest of Smoothie
best conditions to compute performances
the failed and successful tries can be an inspiration for Smoothie

teolemon commented 2 years ago

Note that @AshAman999 is working on this as part of his Google Summer of Code project: https://wiki.openfoodfacts.org/GSOC_2022_-_Offline_Smoothie

monsieurtanuki commented 2 years ago

@teolemon @AshAman999 Oops, then I stop. I would suggest to do it first in a separate project and to focus on the read-only mode first.

teolemon commented 2 years ago

Per @AshAman999 's computation in #2447

All 2,4M barcodes

My estimates were kinda same here as well(17mb) I thought of,
Besides https://fr.openfoodfacts.org/api/v2/search?fields=code&page_size=100 I

[ ] Create a server side solution to dump all barcodes

Products (everything including KP, compressed)

Around 7 kb for each product
75Mb for 10k products
750 Mb for 100k products

Images

https://images.openfoodfacts.org/images/products/335/003/014/4593/front_fr.24.100.jpg
https://images.openfoodfacts.org/images/products/335/003/014/4593/front_fr.24.full.jpg

107MB for 10k (front image) https://squoosh.app/editor

openfoodfacts / smooth-app