openfoodfacts / openfoodfacts-server

Open Food Facts database, API server and web interface - 🐪🦋 Perl, CSS and JS coders welcome 😊 For helping in Python, see Robotoff or taxonomy-editor
http://openfoodfacts.github.io/openfoodfacts-server/
GNU Affero General Public License v3.0
656 stars 385 forks source link

Autoreplace " by * for Organic Ingredients in OCR output #1906

Open teolemon opened 5 years ago

teolemon commented 5 years ago

Autoreplace " by * when "ingredients issus de l'agriculture biologique" is detected

Farine de pois chiche (26%), farine de mais, semoule de mais, farine de riz", tomates/basie 8% (farine de mais, tomate", basilic", oignon', sel), huile de tournesol".

stephanegigandet commented 5 years ago

Which product is this? How common is this pattern?

We currently do not support identifying organic ingredients marked with a *.

teolemon commented 5 years ago

https://world.openfoodfacts.org/product/3770008009417/l-apero-boules-tomate-basilic-bio-funky-veggie It would be just a replace at OCR time

if "ingredients issus de l'agriculture biologique" in string: REPLACE(/word", >> /word*,)

aleene commented 5 years ago

Note that not all * indicates bio, I identified already two other uses.

stephanegigandet commented 5 years ago

There are also products that use more than one symbol (e.g. organic + fair trade), sometimes they use a small upper 1, which can look like a ' from an ocr point of view.

Before doing anything, we should look at many OCR results samples to see how common this is and what we should do exactly in which case.

stephanegigandet commented 4 years ago

from @teolemon :

https://fr.openfoodfacts.org/produit/26017341/muesli-chocolat-amarante-aldi-bon-et-bio

flocons d'avoine complet' 39%, amarante soufflée' 12%, chocolat au lait' 12% (sucre de canne', poudre de lait entier', beurre de cacao', pâte de cacao'), sucre de canne', farine d'avoine complet', semoule de maiïs', huile de tournesol', flocons de blé complet' 5%, farine d'épeautre complet', poudre de cacao maigre', flocons de noix de coco', miel', noisettes concassées', sel de mer. 'Ingrédients issus de l'agriculture biologique.