openfoodfacts / openfoodfacts-server

Open Food Facts database, API server and web interface - 🐪🦋 Perl, CSS and JS coders welcome 😊 For helping in Python, see Robotoff or taxonomy-editor
http://openfoodfacts.github.io/openfoodfacts-server/
GNU Affero General Public License v3.0
660 stars 389 forks source link

Quality error: language mismatch #10997

Open aleene opened 2 weeks ago

aleene commented 2 weeks ago

Problem

The language of the product name and/or ingredients can be different from the language of the field. For the product name it does not have big consequences, but it can be annoying to the user when he sees a language he does not understand. A language mismatch for the ingredients has consequences for the NOVA calculation: not possible.

The easiest way to detect these errors for the product name is to make a word cloud. Image Just by looking at this english cloud for walnut product names one already sees German, Dutch and French texts.

For ingredients one has to go through all the ingredients in use for a specific category.

Proposed solution

Try to determine the language of product name and ingredients based on the taxonomies. Google translate works pretty well, but with our available information we should be able to make a more specific model(?).

If a mismatch is detected, flag it, so we know it exists and can repair.

Additional context

Some products use an english name, but do not have associated ingredients or nutritional values in that language. Usually another language is then used as main language, but the english is kept (Lidl for instance).

Number of products impacted

Would not be surprised if this is 10%.

Time per product

Requites an edit for each product to repair this.