openfoodfacts / openfoodfacts-server

Open Food Facts database, API server and web interface - 🐪🦋 Perl, CSS and JS coders welcome 😊 For helping in Python, see Robotoff or taxonomy-editor
http://openfoodfacts.github.io/openfoodfacts-server/
GNU Affero General Public License v3.0
656 stars 386 forks source link

Ingredients parsing bug "Ingredient A and Ingredient B (81%)" -> "Ingredient A (81%), Ingredient B (81%)" #7816

Open CharlesNepote opened 1 year ago

CharlesNepote commented 1 year ago

Describe the bug

Sometimes fruit estimation is higher than 105 while it shouldn't be. Eg. https://world.openfoodfacts.org/cgi/product.pl?type=edit&code=3038354191904#ingredients image

The json file displays 174.5 in this example: image

This sometimes leads to Nutrition value over 105 - Fruits vegetables nuts estimate from ingredients data quality error.

stephanegigandet commented 1 year ago

The issue is from ingredient parsing, we turn "tomato pulp and tomato puree (72%)" into "tomato pulp (72%), tomato puree (72%)".

One solution could be to make it a composite ingredient instead "tomato pulp and tomato puree (72%) (tomato pulp, tomato puree)".

benbenben2 commented 11 months ago

The issue is from ingredient parsing, we turn "tomato pulp and tomato puree (72%)" into "tomato pulp (72%), tomato puree (72%)". One solution could be to make it a composite ingredient instead "tomato pulp and tomato puree (72%) (tomato pulp, tomato puree)".

Yes, but if your input is "tomato pulp and tomato puree (72%) (tomato 95%, water 5%)" then, your output would be something like "tomato pulp and tomato puree (72%) (tomato pulp, tomato puree) (tomato 95%, water 5%)" not sure it is good.

I would rather simply not split it in two if there is percent. (+) we will not have problem if it is compound/preparation (+) it solves percentage analysis (-) we have to add the whole compound in the taxonomy