openfoodfacts / openfoodfacts-server

Open Food Facts database, API server and web interface - 🐪🦋 Perl, CSS and JS coders welcome 😊 For helping in Python, see Robotoff or taxonomy-editor
GNU Affero General Public License v3.0
636 stars 372 forks source link

Ingredient Parsing - "Wheatflour contains Gluten (...)" & "Farine de blé contient Gluten (...)" #4232

Open svensven opened 4 years ago

svensven commented 4 years ago

M&S products where the "contains Gluten" isn't in brackets are getting that whole ingredient stripped out, in both English and French. Wrapping "contains Gluten" in brackets makes the ingredient and sub-ingredients appear again.

Part of

French: https://uk.openfoodfacts.org/product/00817851/naans-m-s

image image

English: https://uk.openfoodfacts.org/product/00701662/bourbon-biscuits-marks-spencer

image image

svensven commented 3 years ago

and in spanish: https://es.openfoodfacts.org/producto/7804658851273/choco-bolitas-merkat

oxido de zinc. Contiene gluten (cebada)

to

oxido de zinc (cebada)

stephanegigandet commented 3 years ago

It's an unwanted side effect of this code in Ingredients.pm:

        # $contains_or_may_contain_regexp may be the end of a sentence, remove the beginning
        # e.g. this product has been manufactured in a factory that also uses...
        # Some text with comma May contain ... -> Some text with comma, May contain
        # ! does not work in German and languages that have words with a capital letter
        if ($product_lc ne "de") {
            my $ucfirst_contains_or_may_contain_regexp = $contains_or_may_contain_regexp;
            $ucfirst_contains_or_may_contain_regexp =~ s/(^|\|)(\w)/$1 . uc($2)/ieg;
            $text =~ s/([a-z]) ($ucfirst_contains_or_may_contain_regexp)/$1, $2/g;
        }

We probably should keep it for "may contain", but clearly not for "contains". Instead we should match the "contains X and Z" and tag the ingredient with it, as a label.

github-actions[bot] commented 7 months ago

This issue has been open 90 days with no activity. Can you give it a little love by linking it to a parent issue, adding relevant labels and projets, creating a mockup if applicable, adding code pointers from https://github.com/openfoodfacts/openfoodfacts-server/blob/main/.github/labeler.yml, giving it a priority, editing the original issue to have a more comprehensive description… Thank you very much for your contribution to 🍊 Open Food Facts