openfoodfacts / openfoodfacts-server

Open Food Facts database, API server and web interface - 🐪🦋 Perl, CSS and JS coders welcome 😊 For helping in Python, see Robotoff or taxonomy-editor
GNU Affero General Public License v3.0
618 stars 359 forks source link

Improve allergens and traces detection from ingredients lists #1242

Open stephanegigandet opened 6 years ago

stephanegigandet commented 6 years ago

We have 2 fields for allergens: allergens and traces. The value of the allergens field is created from the ingredients list of the main language of the product, and the values of the traces field are coming from users in the traces field of the product edit form.

Bug #1238 is going to change how the fields are populated: both allergens and traces will be extracted from the ingredients list, and users will have the opportunity to add (and only add) allergens with fields in the product edit form.

This new bug is to track changes and improvements to the allergens and traces detection.

In particular:

Detect if an allergen in the ingredient is an actual ingredient ("contains") or a trace, and put it in the correct field. Add detection for allergens in parenthesis (e.g. farine de blé (gluten)) Detect allergens that are not in bold, underscores, parenthesis or all caps, but that are listed as as separate ingredient (e.g. "lait" and "crème", but not "crème de cassis").

teolemon commented 5 years ago

Puede contener trazas de cacahuete, huevo y frutos de cáscara. https://world.openfoodfacts.org/product/8431876038477

teolemon commented 5 years ago

https://es.openfoodfacts.org/producto/8480017057167/batido-cacao-dia

aleene commented 5 years ago

Sounds good. Do you need sample sentences that are used in ingredients lists?