Open teolemon opened 2 years ago
Generating that kind of data for 1000 or more products cannot be on demand, but we can pre-compute it for all countries x their official languages instead.
In fact we already have on demand, it's the search API: https://fr.openfoodfacts.org/api/v2/search?fields=code,product_name&page_size=1000 (but that takes 10 seconds of heavy load on the server, so we could request it when users specifically ask for it in a menu, but certainly not at app initialization, where a pre-computed dump would make much more sense)
Knowledge panels are too big for an offline dump.
So, to experiment on mobile: just use the search API. Once we're happy with it, we can generate 10k dumps for all countries, with results in exactly the same format as the search API.
This issue is stale because it has been open 90 days with no activity.
Probably solved by a proper implementation of Search V2 API in the Dart package
If images don't matter (as it's often the case), CSV exports are very easy to do with Mirabelle (the limit is 1,000,000 lines but we can change it), see below.
I understand the use cases that need images, but is it reasonable to add dozens of MB on a smartphone just to make the search nicer?
1 -- Build your query (or ask someone to build it for you) Eg. all German products that have been scanned at least one time.
-- Products from Germany that have been scanned at least one time
select code, product_name from [all]
where countries_en like "%germany%" and unique_scans_n is not null
order by unique_scans_n desc
-- the limit here displays 20 results; remove it or comment it with "--" when you build your CSV export
limit 20
2 -- Copy "CSV" link on the result page.
3 -- If necessary, edit the link to remove the "limit+20" limit to get all the products. Eg. (don't click this link if you don't want to get 90,000+ products) https://mirabelle.openfoodfacts.org/products.csv?sql=--+Products+from+Germany+that+have+been+scanned+at+least+one+time%0D%0Aselect+code%2C+product_name+from+%5Ball%5D%0D%0Awhere+countries_en+like+%22%25germany%25%22+and+unique_scans_n+is+not+null%0D%0Aorder+by+unique_scans_n+desc%0D%0A--+the+limit+here+displays+20+results%3B+remove+it+or+comment+it+with+%22--%22+when+you+build+your+CSV+export%0D%0A&_size=max
Now you can use this link to download the CSV with your favourite tool (wget, curl, web browser, etc.).
@teolemon Not sure what you're worrying about, as we already download the top 1k products in just one shot in Smoothie (without KP). If we can manage to extract just the barcodes of the top 10k products, we can loop the product download on a selection of 1k barcodes each time.
@monsieurtanuki Which fields do you need in the dump? attributes but not knowledge panels? this would be only to show the scan card, but opening the product page would be through a live query?
@stephanegigandet I just need the barcodes, sorted by descending popularity. From the barcodes I can get everything, in next queries.
As I split the server queries in smaller queries (e.g. with page numbers) I am robust and fast, compared to a hypothetical 10k query that is demanding (perhaps) for the server, that requires that the connection does not fail, that will prevent the app from doing any other background task meanwhile, that requires to download a huge amount of data at once and to de-json it in one shot. In that specific case I also split in two phases: 1- get the top barcodes, and 2- get the related products.
The point being in the end to download the top 10k products (many fields but not the KP), in the background.
@monsieurtanuki we are not sure the server will survive search queries from a lot of users (at this time mongodb is still on a small server). As this is really something which is shared between users, it would seems more logical to generate archive that you can download. If you describe what you need, maybe it's easy to code it as a small script to generate an archive (that could be updated every week).
Who for
What
Why
Part of
6988