openfoodfacts / robotoff

Real-time and batch prediction service for Open Food Facts
GNU Affero General Public License v3.0
72 stars 49 forks source link

Generate and import missing image embeddings #1177

Open raphael0202 opened 11 months ago

raphael0202 commented 11 months ago

The new product categorizer was deployed in March 2023, and since then it categorizes new uploaded products. However, we still don't have predictions for the rest of the database. It uses the 10 most recent images of the product, using image embedding as input (see https://openfoodfacts.github.io/robotoff/explanations/category-prediction/ for more information about the model, section "ML prediction"). To predict categories on the full dataset, we need to generate and import image embeddings for all missing images, to be able to launch category detection. The model that is used to generate the embeddings is stored here: https://github.com/openfoodfacts/robotoff-models/releases/tag/clip-vit-base-patch32. See Robotoff codebase for preprocessing code.

Here is a list of all the missing image paths: source_images.txt.gz

Here is a tutorial on how to download images on Open Food Facts: https://openfoodfacts.github.io/openfoodfacts-server/api/how-to-download-images/

alexgarel commented 9 months ago

As asked by Christelle, here are some embedings from production.

I generated it by using this code

corresponding images are at: