openfoodfacts / openfoodfacts-server

Open Food Facts database, API server and web interface - 🐪🦋 Perl, CSS and JS coders welcome 😊 For helping in Python, see Robotoff or taxonomy-editor
http://openfoodfacts.github.io/openfoodfacts-server/
GNU Affero General Public License v3.0
664 stars 392 forks source link

Add specific threshols for detecting quality errors in vitamins and other components [quality data] #11083

Open jusdekiwi opened 1 day ago

jusdekiwi commented 1 day ago

Problem

The current threshold for triggering a quality error for a vitamin's amount is 105g per 100g. I think it is not precise enough and we can do much better! Image On the screenshot below, only 5 values raise an error instead of all of them since the unit is wrong. Here the problem comes from the units, but it could come from a typo, like it often does in the nutritional facts errors. Image

Proposed solution

I suggest we establish a specific threshold for each vitamin so we can detect new quality errors. I've extracted the max values for each vitamin so we can set a threshold value above the max known value.

Image Retrievable data I've extracted from ciqual and usda (the proposed threshold here is 4 times greater that the maximal known value): nutrient max values.ods

Expected outcome

Many new errors would be raised but we would be able to fix the products' data more easily and then improve the overall quality of the database :)

Note: I've arbitrarily chosen a factor of 4 for the example but we should discuss together which value would be the best.

aleene commented 1 day ago

We should exclude the supplements category for this.

I rather start with a low value and increase if needed. I assume this will be added to the vitamins/minerals taxonomy