openfoodfacts / openfoodfacts-ai

This is a tracking repo for all our AI projects. 🍕 🤖🍼
226 stars 53 forks source link

Use LLMs to extract ingredient lists from raw text #309

Open raphael0202 opened 1 year ago

raphael0202 commented 1 year ago

Successful test using ChatGPT (GPT-3.5):

Extract ingredient lists from the following texts. The ingredient list should start with the first ingredient and include allergy, label or origin information if this information is provided just after the ingredient list.
The output format must be a JSON list containing one element per ingredient list. If there are ingredients in several languages, the JSON should contains as many elements as detected ingredient languages. Each JSON element should have two fields: 
- a "text" field containing the detected ingredient list. The text should be a substring of the original text, you must not alter the original text.
- a  "lang" field containing the detected language of the ingredient list.
Don't output anything else than the expected JSON.

(FI) Hapatettua vaniljakaurajäätelöä Ainesosat Kaurapohja (vesi, kaura), maltodekstriini, dekstroosi, kasviöljyt (kookos, rypsi), sokeri, tärkkelyssiirappi, vanilja (1,5 %), perunaproteiini, emulgointiaine (E471), stabilointiaineet (johanneksenleipäpuujauhe, guarkumi), suola, luontainen vanilja-aromi, hapate. Sisältää 5% kauraa (SV/DA/NO) Syrad vaniljhavreglass/Symmet vaniljehavreis Ingredienser: Havrebas (vatten/vand/vann, havre), maltodextrin, dextros, vegetabi olier (kokos, rybs), socker/sukker, stärkelsesirap/stivelsesirup, vanilj (1,5%), potas-kat potetprotein, emulgeringsmedel/emulgator (E471), stabiliseringsmedel/stabilisatore (fruktkärnmjöl/johannesbrødkernemel, guarkärnmjöl/guargummi, salt, naturligania symingskultur. Innehåller/Indeholder 5 % havre.  Ravintoarvot/Näringsvärde/ Næringsindhold/Næringsinnhold Energia/Energi Rasva/Fett/Fedt josta tyydyttynyttä / varav mättat fett/heraf Hiilihydraatit / Kolhydrat / Kulhydrat/Karbohydrat maettede fedtsyrer / hvorav mettede fettsyrer josta sokereita/varav sockerarter/ heraf sukkerarter/hvorav sukkerarter  Proteiini/Protein Suola/Salt  836 kJ/200  SR **S


        "text": "Kaurapohja, maltodekstriini, dekstroosi, kasviöljyt (kookos, rypsi), sokeri, tärkkelyssiirappi, vanilja (1,5 %), perunaproteiini, emulgointiaine (E471), stabilointiaineet (johanneksenleipäpuujauhe, guarkumi), suola, luontainen vanilja-aromi, hapate",
        "lang": "fi"
        "text": "Havrebas, maltodextrin, dextros, vegetabi olier (kokos, rybs), socker, stärkelsesirap, vanilj (1,5%), potas-kat potetprotein, emulgeringsmedel (E471), stabiliseringsmedel (fruktkärnmjöl, guarkärnmjöl, salt, naturligania symingskultur",
        "lang": "sv/da/no"

Part of

raphael0202 commented 1 year ago

Another one:

Extract ingredient lists from the following texts. The ingredient list should start with the first ingredient and include allergy, label or origin information if this information is provided just after the ingredient list.
The output format must be a JSON list containing one element per ingredient list. If there are ingredients in several languages, the JSON should contains as many elements as detected ingredient languages. Each JSON element should have two fields: 
- a "text" field containing the detected ingredient list. The text should be a substring of the original text, you must not alter the original text.
- a  "lang" field containing the detected language of the ingredient list.
Don't output anything else than the expected JSON.

P  bir  Depuis 1976, notre chocolaterie liégeoise crée, produit en toute indépendance... de manière responsable et transparente.  Onze Luikse chocolaterie creëert en produceert sinds 1976 in alle onafhankelijkheid, op een verantwoorde en transparante manier.  Our chocolate factory, proudly based in Liège/has been producing and creating independently, responsibly and transparently since 1976.  FR-Chocolat noir 85% de cacao et éclats de caramel. Ingrédients: chocolat noir 85% de cacao (pâte de cacao, sucre, beurre de cacao, émulsifiant : lécithine de soja), éclat de caramel 19% (sucre, beurre de cacao, amidon de riz/sel de Guérande, acidifiant: bicarbonate de sodium), sel de Guérande. Cacao: 85% min. Peut contenir : des fruits à coque, lait, oeufs et céréales contenant du gluten.  NL - Pure chocolade 85% cacao en karamelstukjes. Ingrediënten: pure chocolade 85% cacao (cacaopasta, suiker, cacaoboter, emulgator: sojalecithine), karamel stukjes 19% (suiker, cacaoboter, rijstzetmeel, Guérande zout, voedingszuur: natriumbicarbonaat), Guérande zout. Cacao: 85% min. Kan bevatten: schaalvruchten, melk, eieren en gluten bevattende granen.  EN-Dark chocolate 85% cocoa and caramel pieces. Ingredients: dark chocolate 85% cocoa (cocoa paste, sugar, cocoa butter, emulsifier: soy lecithin), caramel pieces 19% (sugar, cocoa butter, rice starch, Guérande salt, acidifier: sodium bicarbonate), Guérande salt. Cocoa: 85% min. May contain: nuts, milk, eggs and cereals containing gluten.  DE-Bitterschokolade 85% Kakao und Karamell stückchen. Zutaten: Bitterschokolade 85% Kakao (Kakaomasse, Zucker, Kakaobutter, emulgator: Sojalecithin), Karamellblättchen 19% (Zucker, Kakaobutter, Reisstärke, Guérande-Salz, Säuerungsmittel: Natriumbicarbonat), Guérande-Salz. Kakao: 85% mindestens. Kann enthalten: Schalenfruchten, Milch, Eiern und Glutenhaltiges Getreide.  ES- Chocolate negro 85% de cacao y copos de caramelo. Ingredientes: chocolate negro 85% cacao (pasta de cacao, azúcar, manteca de cacao, emulsionante: lecitina de soja), copos de caramelo 19% (azúcar, manteca de cacao, almidón de arroz, sal de Guérande, acidificante: bicarbonato de sodio), sal de Guérande. Cacao: 85% min. Puede contener: frutos con cáscara, leche, huevos y cereales que contengan gluten.  يحسن استهلاكه قبل  IT-Cioccolato fondente 85% cacao e scoppio di caramello. Ingredienti: cioccolato fondente 85% cacao (pasta di cacao, zucchero, burro di cacao, emulsionante: lecitina di soia), schegge di caramello 19% (zucchero, burro di cacao, amido di riso, sale di Guérande, acidificante: bicarbonato di sodio), sale de Guérande. Cacao: 85% min. Può contenere: frutta a guscio, latte, uova e cereali contenenti glutine.  المكونات : شوكولا سوداء 85 % على الأقل من الكاكاو ( عجينة كاكاو, سكر, زبدة كاكاو, مستحلب : ليسيتين الصويا ), كاراميل مدقوق 19 % ( سكر, زبدة كاكاو, نشأ إرز, ملح, محمض : بيكربونات الصوديوم ), ملح. كاكاو : 85 % على الأقل. يمكن ان يحتوي على : فواكه جافة وحليب وبيض وحبوب محتوات على غولتين.  290CT2021  Consumare preferibilmente entro il: Consumir preferentemente antes del: Ten minste houdbaar tot: / Best before: A consommer de préférence avant le: Mindestens haltbar bis: 15 \"412038129373  FAIRTRADE  Max 16°C / 60.8°F  Made in Belgium Galler Chocolatiers S.A. chocolaterie Galler Rue de la Station 39 indépendante  COCOA  80g € 2.80Z  Cacao: le bilan de masse est utilisé pour correspondre au volume acheté aux conditions Fairtrade.  4051 Vaux-sous-Chèvremont - Belgium Phone: +32 (0)4/367.22.11 -  Analyses nutritionnelles pour 100g Voedingswaarden per 100g / Nutrition facts per 100g/ Nährwerte pro 100g / Información nutricional por 100g Valori nutrizionali per 100g/100  Valeur énergétique Energetische waarde / Energy value / Brennwert / Valor energético / Valore energetico /  Matières Vetten/Fat/  grasses  Fett / Grasas /Grassi /  في  Dont acides gras saturés / Waarvan verzadigde vetzuren / Of which saturated fat / Davon gesättigte Fettsäuren / De los cuales ácidos grasos saturados / Di cui acidi grassi saturi /sl  Glucides / Koolhydraten / Carbohydrates / Kohlenhydrate/ Hidratos de carbono /  الـكـريـوهـيـدرات / Carboidrati  Dont sucres / Waarvan suikers/Of which sugars / Davon Zucker / De los cuales azúcares /  منها سكر/ Di cui zuccheri  Fibres/Vezels/Fibers/ Fasern/Fibras / Fibre/  صنع ببلجيكا  Protéines / Eiwitten / Protein / Eiweiß / Proteínas / Proteine / Sel/Zout/Salt / Salz/Sal / Sale/  2292 kJ 548 kcal  41,1g  24,7g  29,6g  25,3g  12,7g  8,0g  1,3g  TON  ww 


        "text": "chocolat noir 85% de cacao (pâte de cacao, sucre, beurre de cacao, émulsifiant : lécithine de soja), éclat de caramel 19% (sucre, beurre de cacao, amidon de riz/sel de Guérande, acidifiant: bicarbonate de sodium), sel de Guérande",
        "lang": "fr"
        "text": "Pure chocolade 85% cacao (cacaopasta, suiker, cacaoboter, emulgator: sojalecithine), karamel stukjes 19% (suiker, cacaoboter, rijstzetmeel, Guérande zout, voedingszuur: natriumbicarbonaat), Guérande zout",
        "lang": "nl"
        "text": "dark chocolate 85% cocoa (cocoa paste, sugar, cocoa butter, emulsifier: soy lecithin), caramel pieces 19% (sugar, cocoa butter, rice starch, Guérande salt, acidifier: sodium bicarbonate), Guérande salt",
        "lang": "en"
        "text": "Bitterschokolade 85% Kakao (Kakaomasse, Zucker, Kakaobutter, emulgator: Sojalecithin), Karamellblättchen 19% (Zucker, Kakaobutter, Reisstärke, Guérande-Salz, Säuerungsmittel: Natriumbicarbonat), Guérande-Salz",
        "lang": "de"
        "text": "Chocolate negro 85% cacao (pasta de cacao, azúcar, manteca de cacao, emulsionante: lecitina de soja), copos de caramelo 19% (azúcar, manteca de cacao, almidón de arroz, sal de Guérande, acidificante: bicarbonato de sodio), sal de Guérande",
        "lang": "es"
        "text": "cioccolato fondente 85% cacao (pasta di cacao, zucchero, burro di cacao, emulsionante: lecitina di soia), schegge di caramello 19% (zucchero, burro di cacao, amido di riso, sale di Guérande, acidificante: bicarbonato di sodio), sale de Guérande",
        "lang": "it"
        "text": "شوكولا سوداء 85 % على الأقل من الكاكاو ( عجينة كاكاو, سكر, زبدة كاكاو, مستحلب : ليسيتين الصويا ), كاراميل مدقوق 19 % ( سكر, زبدة كاكاو, نشأ إرز, ملح",
        "lang": "ar"


        "text": "FR-Chocolat noir 85% de cacao et éclats de caramel. Ingrédients: chocolat noir 85% de cacao (pâte de cacao, sucre, beurre de cacao, émulsifiant : lécithine de soja), éclat de caramel 19% (sucre, beurre de cacao, amidon de riz/sel de Guérande, acidifiant: bicarbonate de sodium), sel de Guérande.",
        "lang": "FR"
teolemon commented 1 year ago

ChatGPT 3.5 or 4 @raphael0202 ?

raphael0202 commented 1 year ago

@teolemon 3.5