openfoodfacts / openfoodfacts-server

Open Food Facts database, API server and web interface - 🐪🦋 Perl, CSS and JS coders welcome 😊 For helping in Python, see Robotoff or taxonomy-editor
http://openfoodfacts.github.io/openfoodfacts-server/
GNU Affero General Public License v3.0
654 stars 384 forks source link

Improve scanbot, generate more accurate country top products, store each product scans in scans.json files #5083

Open stephanegigandet opened 3 years ago

stephanegigandet commented 3 years ago

I should have opened a bug for this before coding it, but later is better than never. So filing a bug so that we can find it / find the PR more easily later if needed.

https://github.com/openfoodfacts/openfoodfacts-server/pull/5079

We need scans data to be available for each product. While it would be cool to have data by year, months, days etc. (see #1501), it would be already be very useful to have it by year and by country.

This update to scanbot creates a new scans.sto file with aggregated scan data by year and country.

Example:


./sto_to_xml.pl /srv/off/products/800/227/001/4901/scans.sto 
/srv/off/products/800/227/001/4901/scans.sto$VAR1 = {
          '2020' => {
                      'unique_scans_n_by_country' => {
                                                       'world' => 4,
                                                       'fr' => 2,
                                                       'be' => 2
                                                     },
                      'unique_scans_n' => 4,
                      'scans_n' => 4,
                      'unique_scans_rank_by_country' => {
                                                          'fr' => 19,
                                                          'world' => 1,
                                                          'be' => 1
                                                        }
                    }
        };

It also changes the way we compute the top products per country, to make it more accurate. Currently the top lists of products per country are derived from the top list of product for the world, and if the product is sold in a country, it is added to the top list of each country. But that means that the most globally popular products always come first (especially for countries where we have few scans compared to others).

The new way really computes the top products for a country based on only the scans for that country.

github-actions[bot] commented 9 months ago

This issue has been open 90 days with no activity. Can you give it a little love by linking it to a parent issue, adding relevant labels and projets, creating a mockup if applicable, adding code pointers from https://github.com/openfoodfacts/openfoodfacts-server/blob/main/.github/labeler.yml, giving it a priority, editing the original issue to have a more comprehensive description… Thank you very much for your contribution to 🍊 Open Food Facts