Open rugk opened 11 months ago
@raphael0202
@rugk we plan to extract automatically nutrition next year using Robotoff (our machine learning system)
@rugk Yes it's a project we would like to do in 2024! Is it something you would be interested in contributing to?
Likely not technically, but testing for sure.
@raphael0202 is actively working on this. More updates soon.
The nutrient extraction model was deployed and integrated to Robotoff. For every new image, we run the model on it and generate a prediction. An insight is generated if in the extracted nutrient values, at least one value is not present in the current nutrients.
To get the insights:
GET https://robotoff.openfoodfacts.org/api/v1/insights?insight_types=nutrient_extraction&barcode={BARCODE}
Nutrient values are in insight.data
. It contains:
entities
: a subset of the extracted entities at different processing steps (raw
, aggregated
, postprocessed
). We only have postprocessed
here, it's more useful as debug information. We can ignore this field here.nutrients
: a dictionary mapping nutrient name to a dict containing:
value
: the value to add, without the unitunit
: the unit (can be g
, mg
, µg
or null
). If it's null
, it's because we couldn't extract it from the image (either it's missing, the model was wrong or the OCR result was not good enough). In such case I think we can safely use the "default" unit, which depends on the nutrient (as it's done on Product Opener).score
: the entity score. Maybe not really relevant here, as this score is not calibrated (most values are > 0.98).char_start
, char_end
: start and end character offsets in the original textstart
, end
: start and end word offset in the original textcc @monsieurtanuki @g123k
We would go for an "Extract Nutrition Facts" button, very much like ingredients, only enabled if we have a precomputer nutrition insight already available (otherwise greyed out). That's a way to keep the user in charge, and avoid potential backlash on "unwanted/unhelpful help" from Robotoff.
The model is not language dependant, and we don't need to care about image selection in a specific language
Additional values provided by the model would be orange, unchanged values (also provided by the model) would stay normal.
In the future, We might nudge the user for a new photo if we deem it too old, but the result won't be instant (photo, bg task, inference, reloading). We might use animations (either on the button and/or the fields to provide hints that new values arrived. Button because all the updated fields might not be visible above the fold)
@raphael0202
For every new image
No history, then.
we run the model on it and generate a prediction
How fast? like, 10 seconds?
An insight is generated if in the extracted nutrient values, at least one value is not present in the current nutrients.
Of course that'll make more sense for new products. What if the robotoff value is different from the current value: do you still send it?
No history, then.
I'm going to process the full image backlog in the coming weeks.
How fast? like, 10 seconds?
Yes, about 10s (we're still running on CPU)
Of course that'll make more sense for new products.
Just to be clear, if the product has no nutrition values, we still generate an insight of course.
What if the robotoff value is different from the current value: do you still send it?
Yes, we currently only generate an insight if the model predicted nutrition values that are not present in the original product. Note that the model can extract both _serving
and _100g
values.
We discussed a bit the integration the other day with @teolemon, and we came to the conclusion that when clicking on this "extract" button, we could overwrite the product nutrient values that conflict (=because they are already present) with the model prediction.
Also, we consider images from newest to oldest, which means that the image we're extracting nutrition values from are not necessarily ones from the selected image, as we can consider a more recent image.
The used image is indicated in the insight.source_image
field returned by the route.
Here are some mockups done by @teolemon to illustrate the behaviour we discussed about :)
Behaviour if no insight is available (greyed button):
Behaviour if an insight is available:
Behaviour once the user clicked on the button:
The idea would be to perform the GET https://robotoff.openfoodfacts.org/api/v1/insights?insight_types=nutrient_extraction?barcode={BARCODE}
either before (on product scan?) or after the user goes on the nutrition page.
Before is maybe not the best idea, as the model could run in the meantime. If it's after, we should probably add a loader to the extract button to show we're performing the request.
The idea of greying the button would be to avoid the user to be disappointed when the model failed to extract anything.
I'm going to process the full image backlog in the coming weeks.
Cool!
Yes, about 10s (we're still running on CPU)
OK
Just to be clear, if the product has no nutrition values, we still generate an insight of course.
OK
Yes, we currently only generate an insight if the model predicted nutrition values that are not present in the original product. Note that the model can extract both
_serving
and_100g
values. We discussed a bit the integration the other day with @teolemon, and we came to the conclusion that when clicking on this "extract" button, we could overwrite the product nutrient values that conflict (=because they are already present) with the model prediction.
A bit confusing. Not clear what you do if in off the product has 10g of proteins and robotoff guesses it's 11g: the value is present but is different.
Not focused on the UI/UX for the moment. I'm rather focused on the off-dart aspect, and I have trouble testing the feature with GETs: any hint?
A bit confusing. Not clear what you do if in off the product has 10g of proteins and robotoff guesses it's 11g: the value is present but is different.
Here we would overwrite the value of the nutrient.
I'm rather focused on the off-dart aspect, and I have trouble testing the feature with GETs: any hint?
A typo slipped in the URL (? instead of &) ;)
Problem
I'm always frustrated when I have to enter the nutrition facts manually. :wink:
Proposed solution
Especially given the recent developments of machine learning models getting more powerful, I guess an automatic OCR/ML recognition would be possible? Could not you train a ML model based on your OpenFoodFacts data?
Additional context
Of course it has to be corrected manually, but it would already be a good starter.
Mockups
N/A (likely just similar to the other OCR features for ingredients e.g.)