Generate and import missing image embeddings

The new product categorizer was deployed in March 2023, and since then it categorizes new uploaded products. However, we still don't have predictions for the rest of the database. It uses the 10 most recent images of the product, using image embedding as input (see https://openfoodfacts.github.io/robotoff/explanations/category-prediction/ for more information about the model, section "ML prediction"). To predict categories on the full dataset, we need to generate and import image embeddings for all missing images, to be able to launch category detection. The model that is used to generate the embeddings is stored here: https://github.com/openfoodfacts/robotoff-models/releases/tag/clip-vit-base-patch32. See Robotoff codebase for preprocessing code.

Here is a list of all the missing image paths: source_images.txt.gz

Here is a tutorial on how to download images on Open Food Facts: https://openfoodfacts.github.io/openfoodfacts-server/api/how-to-download-images/

openfoodfacts / robotoff

Generate and import missing image embeddings #1177