Creation of the benchmark and evaluation algorithm to evaluate the spellcheck
Benchmark
The benchmark is composed of 247 lists of ingredients from 3 data sources:
30% of the old dataset composed of manually corrected lists of ingredients in French from the previous work by Lucain W. Unmodified lists of ingredients are removed.
15 manually corrected lists of ingredients in different languages (used to prompt engineer OpenAI on the Spellcheck task)
100 lists of ingredients with the tag 50-percent-unknown corrected with GPT-3.5. It follows the correction guidelines.
Argilla to validate benchmark
Lists of ingredients corrected with GPT-3.5 are checked and modified to respect the spellcheck guidelines.
Evaluation algorithm
An evaluation algorithm is created to estimate the performance of the Spellcheck.
It calculates the Precision-Recall of the correction based on text sequences (Original-Reference-Prediction) by using tokenization and alignment algorithm.
What
Creation of the benchmark and evaluation algorithm to evaluate the spellcheck
Benchmark
The benchmark is composed of 247 lists of ingredients from 3 data sources:
Argilla to validate benchmark
Lists of ingredients corrected with GPT-3.5 are checked and modified to respect the spellcheck guidelines.
Evaluation algorithm
An evaluation algorithm is created to estimate the performance of the Spellcheck. It calculates the Precision-Recall of the correction based on text sequences (Original-Reference-Prediction) by using tokenization and alignment algorithm.