openfoodfacts / openfoodfacts-server

Open Food Facts database, API server and web interface - 🐪🦋 Perl, CSS and JS coders welcome 😊 For helping in Python, see Robotoff or taxonomy-editor
GNU Affero General Public License v3.0
633 stars 371 forks source link

Assuming 0 if value for fiber is not given on package lead to bad Nutri-Score #8160

Open fabi003 opened 1 year ago

fabi003 commented 1 year ago

Describe the bug

Rye whole-grain bread, such as https://world.openfoodfacts.org/product/4007933457419/pumpernickel-go-bio have a high content of fiber. But if you look at the product page, the computed Nutri-Score is only C instead of A as printed on the package. After some testing and experiments using the Nutri-Score calculator on https://nutrirechner.xyz/en/ it turns out that the OFF server seems to assume 0 g of fiber, even if the field for fiber is filled with “-” as documented if the value is not given on the package instead of leaving it blank.

grafik

But if you look on similar products in this special whole-grain rye bread category and on https://world.openfoodfacts.org/product/4007933252045/pumpernickel-delba especially, which is the same product, produced in the same factory (Delba Backbetrieb GmbH) but marketed under a different (discounter) brand, it contains at least 7.9 g of fiber/100 g.

If you then fill in 7.9 g of fiber in the Nutri-Score calculator, it turns out that the Nutri-Score on the package is right.

grafik

To Reproduce

See above.

Expected behavior

The Openfoodfacts server computes the right Nutri-Score of A for the example above instead of C.

Screenshots

No response

Additional context

No response

Type of device

Browser

Browser version

No response

Number of products impacted

Should affect at least all whole-grain products, which are underrated because of this bug, even as they are healthy.

Time per product

No response

aleene commented 1 year ago

Very good deduction. You can always use the Folksonomy to indicate that there is an error mde by the producer (key producer_issue). Normally the error would be inconsistent nutriscore, as package and calculatio mismatch, but after your deduction it is more fiber missing.

The next question is what OFF should do. In general when no fiber has been specified it means that there is no fiber, so the assumption of 0 is good. In the case of this category OFF could not do a calculation (as data is missing) or do a upper limit calculation as it does now.

In both cases it would be nice if we could add a warning to infirm the user that there is an issue. But how is this warning to be triggered? In the narrow case of missing fiber we should indicate somewhere that this product should have a fiber value. Most logical is to add this to the category taxonomy, so that any product that has been assigned can be triggered.

But what to add in the taxonomy? A label, like fiber rich, or a mean value, so that we can do a calculation, and present an implied nutriscore?

And why stop here, now that we are at it? Maybe we can check whether the values fall between a range, and trigger a warning when a product falls outside this range.

And in the broader case: why only fiber? We could do this for any nutritional value.

In short: you opened a can of worms here, very interesting, but a large effort. Maybe we could define a student project for this.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 90 days with no activity.

benbenben2 commented 9 months ago

That is a good point @fabi003

I was thinking about - in the case where fiber value is missing - replacing fiber value by the average value for that category (lowest category in the hierarchy, rye-breads OR pumpernickel): https://world.openfoodfacts.org/category/rye-breads https://world.openfoodfacts.org/category/pumpernickel Here, that would be rye-breads (8.36) OR pumpernickel (10.5)

Challenges: 1) which one to choose when like in this case we have two possibilities rye-breads (8.36) or pumpernickel (10.5), who have both exactly same parents suggestion: take average of the possible values 2) if we give a value for all products where fiber is missing based on existing averaged values, BUT if this averaged values was wrong at the first place (that is for example, there are just few products with majority having wrong value for fiber), then that will completely mess up the average value, and that will be hard to roll back to a correct averaged value

Other option, similar to what has been done in the past for nutriscrore https://github.com/openfoodfacts/openfoodfacts-server/pull/8360 where we added a tag "expected_nutriscore_grade:en" for expected nutrscore for corresponding category. we could introduce a new tag with default fiber value in the taxonomy for categories (-) but where would that default value come from?

I don't think that any of these two options are good. Unfortunately :(