openfoodfacts / openfoodfacts-server

Open Food Facts database, API server and web interface - 🐪🦋 Perl, CSS and JS coders welcome 😊 For helping in Python, see Robotoff or taxonomy-editor
http://openfoodfacts.github.io/openfoodfacts-server/
GNU Affero General Public License v3.0
658 stars 387 forks source link

Don't add ingredients' percentages when inside parenthesis #8721

Open CharlesNepote opened 1 year ago

CharlesNepote commented 1 year ago

Computing ingredients' percentage is hard and sometimes leads to data quality errors: https://world.openfoodfacts.org/data-quality-error/en:nutrition-value-over-105-fruits-vegetables-nuts-estimate-from-ingredients (868 errors as of 2023-07-24).

At least this pattern could be taken into account: not adding ingredients' percentages when inside parenthesis, and when there is a percentage just before the parenthesis. Here is a clear example: https://world.openfoodfacts.org/cgi/product.pl?type=edit&code=3450970052160#ingredients image

stephanegigandet commented 1 year ago

The issue is not the parenthesis, it's the first part: "purée de tomates mi-réduite et pulpe de tomates (tomates, purée de tomates, correcteur d'acidité : acide citrique) 73%" that we convert to "purée de tomates mi-réduites 73% + pulpe de tomates 73%, it's a rare pattern

CharlesNepote commented 1 year ago

You're right.

I've found a relevant example: https://world.openfoodfacts.org/product/3560071400095/pomme-pruneau-carrefour-bio?rev=16

image

leads to: Details of the analysis of the ingredients: Purée de pommes 74.9%, pruneaux 25%, eau, pruneaux 11.5%, antioxydant (acide ascorbique)

This request lists 370+ products with both: