Open stephanegigandet opened 5 years ago
Issues:
Senf - saaten
, Ge - würze
_soja_lecithine
,_tarwe_bloem
QS
or QS-Ware
is a label for certain products, mostly meat, produced under certain (good/favourable) conditions, I tagged a few with "QS" before https://world.openfoodfacts.org/label/qs
You are right that it probably shouldn't be listed in ingredients, I guess it ended up in there because OCR doesn't remove these kind of texts(?)
@chk1 @aleene : I had a look at the lists with QS-Ware, we can probably use the same parsing feature we have for organic and/or fair-trade ingredients (things like Sugar, salt. : organic)
Applied changes from @aleene to the German ingredients to all products, we are now at exactly 80% of recognized ingredients for German:
Type | Unique tags | Occurrences |
---|---|---|
known | 1791 (5.56%) | 225071 (80.00%) |
unknown | 30398 (94.43%) | 56284 (20.00%) |
all | 32190 (100.00%) | 281355 (100.00%) |
I went through all ingredients from https://de.openfoodfacts.org/ingredients?status=unknown with unknown status and more than 50 occurencies.
Quite some of them are already in the ingredients.txt. Why do they still show up in the list with status unknown?
sign | meaning |
---|---|
! | Already in ingredients.txt |
v | Added to ingredients.txt in #2323 |
x | needs further follow-up |
? | may safely be ignored |
! |
244 |
Already in ingredients.txt |
|
v |
171 |
Added to ingredients.txt |
|
v |
163 |
Added to ingredients.txt |
|
? |
|
157 |
Empty string probably like: ,, |
v |
155 |
Added to ingredients.txt |
|
? |
155 |
Belongs to “Fett i. Tr.”, can be safely ignored. |
|
! |
151 |
Already in ingredients.txt |
|
! |
135 |
Already in ingredients.txt |
|
x |
114 |
Used in many sausage products, probably “Glucon Delta Lecton E 575” according to https://web.archive.org/web/20180719074852/https://www.merkur.de/wirtschaft/sind-tricks-lebensmittelindustrie-zr-7303827.html |
|
x |
111 |
https://de.wikipedia.org/wiki/W%C3%BCrze_(Lebensmittel) https://www.lebensmittelklarheit.de/informationen/wuerze-hat-mit-gewuerzen-nicht-viel-zu-tun Leitsätze für Gewürze und andere würzende Zutaten. In: Deutsches Lebensmittelbuch. Deutsche Lebensmittelbuch-Kommission. may contain soya / wheat
Edit: Added to ingredients.txt as new entry. |
|
! |
108 |
Already in ingredients.txt |
|
! |
104 |
Already in ingredients.txt |
|
? |
100 |
May be added as ”release agent“? Can be safely ignored, because always specified which release agent is used. |
|
! |
99 |
Already in ingredients.txt |
|
v |
96 |
Added to ingredients.txt |
|
v |
95 |
Always in combination with vegetable fat. Added to ingredients.txt |
|
! |
92 |
Already in ingredients.txt |
|
v |
89 |
Added to ingredients.txt |
|
v |
89 |
Added to ingredients.txt |
|
! |
85 |
Already in ingredients.txt |
|
v |
83 |
Added to ingredients.txt |
|
v |
82 |
Added to ingredients.txt |
|
! |
79 |
Already in ingredients.txt |
|
x |
77 |
Should be changed for products to Säuerungsmittel → bot? |
|
v |
76 |
Added to ingredients.txt |
|
! |
76 |
Already in ingredients.txt |
|
! |
75 |
Already in ingredients.txt but only with German entry |
|
v |
75 |
Added to ingredients.txt |
|
x |
75 |
Needs parsing/OCR improvements, often in combinations like “Frische Vollmilch, 3,5% Fett, pasteurisiert, homogenisiert.” |
|
! |
74 |
Added to ingredients.txt |
|
? |
73 |
Belongs to “Fett i. Tr.”, can be safely ignored. |
|
! |
73 |
Already in ingredients.txt but only with German entry |
|
v |
71 |
Added to ingredients.txt Milk categories need further cleanup, e.g. Magermilch (0.3%) in semi-skimmed milk section and fettarme Milch (>1.5%) in skimmed milk section |
|
v |
67 |
Added to ingredients.txt |
|
x |
66 |
Needs parsing/OCR improvements, often in combinations like “Frische Vollmilch, 3,5% Fett, pasteurisiert, homogenisiert.” |
|
! |
64 |
Already in ingredients.txt |
|
! |
62 |
Already in ingredients.txt |
|
v |
60 |
Added to ingredients.txt |
|
! |
60 |
Already in ingredients.txt |
|
v |
58 |
Added to ingredients.txt |
|
v |
58 |
Added to ingredients.txt |
|
! |
55 |
Already in ingredients.txt |
|
v |
54 |
Added to ingredients.txt |
|
x |
54 |
Parsing/OCR error. Should be changed for products from “Mono - und …” to “Mono- und …” → bot? |
|
! |
54 |
Already in ingredients.txt |
|
! |
54 |
Already in ingredients.txt |
|
? |
53 |
Can be safely ignored |
|
! |
53 |
Already in ingredients.txt |
|
v |
52 |
Usually in cheeses (also in few meat products) Added to ingredients.txt |
|
? |
51 |
Can be safely ignored |
|
x |
50 |
Should be changed for products to Speisesalz → bot? |
|
v |
50 |
Added to ingredients.txt |
I added many of those. Maybe it takes some time before everything is parsed again?
Type | Unique tags | Occurrences |
---|---|---|
known | 2084 (5.65%) | 335677 (85.07%) |
unknown | 34796 (94.35%) | 58890 (14.93%) |
all | 36881 (100.00%) | 394567 (100.00%) |
Feb 7th 2020
Meta bug to track the issues with parsing ingredients lists in German.
See also bug #2023 for general ingredient parsing improvements in all languages.
Current status:
https://de.openfoodfacts.org/ingredients?stats=1
27 Aug 2019
Feb 7th 2020