feat: Spellcheck benchmark dataset and evaluation algorithm

What

Creation of the benchmark and evaluation algorithm to evaluate the spellcheck

Benchmark

The benchmark is composed of 247 lists of ingredients from 3 data sources:

30% of the old dataset composed of manually corrected lists of ingredients in French from the previous work by Lucain W. Unmodified lists of ingredients are removed.
15 manually corrected lists of ingredients in different languages (used to prompt engineer OpenAI on the Spellcheck task)
100 lists of ingredients with the tag 50-percent-unknown corrected with GPT-3.5. It follows the correction guidelines.

Argilla to validate benchmark

Lists of ingredients corrected with GPT-3.5 are checked and modified to respect the spellcheck guidelines.

Evaluation algorithm

An evaluation algorithm is created to estimate the performance of the Spellcheck. It calculates the Precision-Recall of the correction based on text sequences (Original-Reference-Prediction) by using tokenization and alignment algorithm.

openfoodfacts / openfoodfacts-ai

feat: Spellcheck benchmark dataset and evaluation algorithm #340

What

Benchmark

Argilla to validate benchmark

Evaluation algorithm