openfoodfacts / open-prices

An open database of food prices - 🧾💸💰🏷️🤑🍽️
GNU Affero General Public License v3.0
27 stars 8 forks source link

Proof: send to Google Cloud Vision, and store the matching JSON output #320

Open teolemon opened 2 weeks ago

teolemon commented 2 weeks ago

Problem

Links

Why

teolemon commented 1 week ago

PXL_20240622_190451291.jpg

Store the number of items, and make it available so that we can nudge the user properly

raphodn commented 2 days ago

Quick test with Google Cloud Vision :

Input Output (after replacing "\n" by " ")
image SATORIZ COMBOIRE Chers clientes et clients Le magasin est ouvert de 8h30 à19h30 Merci de vo tre compréhention 12 Rue des Montagnes de Lans 38 130 ECHIROLLES 04.76.40.42 34 Impression sur Caisse N : 1 Date Impression : 27/06/2024 a 18:17:34 Ticket de la caisse N : 1 Ticket: 7658707 Date Ticket: 27/06/2024 18:15:37 Serveur : ENORA Ticket ENCAISSER QTE PU € PP € 0.464 kg KA RE FOURRE CH 10.70 4.96 0.476 kg KA RE FOURRE CH 10.70 5.09 0.238 kg LEVURE MALTEE V 29.95 7.13 0.216 kg RAISINS SULTANI 7.20 1.56 1 un POIS CHICHES 24 1.15 1.15 1 un TOMATES CONCASS 1.05 1.05 1 un POIS CHICHES 24 1.15 1.15 1 un TOMATES PELEES 1.05 1.05 1 un MAYONNAISE NATU 4.55 4.55 1 un MOUTARDE DIJON 1.80 1.80 (Remise de -14.29%) 1 un MOUTARDE ANCIEN 1.80 1.80 (Remise de -14.29%) 1 un LAIT COCO 17% 2 1.05 1.05 0.422 kg COURGETTE 3.60 1.52 0.262 kg AUBERGINE 4.50 1.18 0.496 kg SUCRE CANNE ROU 2.25 1.12 0.788 kg RIZ BASMATI BLA 3.85 3.03 0.210 kg ABRICOTS BRUNS 17.15 3.60 0.138 kg MANGUES vrac 17.95 2.48 0.328 kg KAOKA PEPITES 6 15.35 5.03 Client N 1 50.30 Total Ttc 50.30€ Carte Bancaire 50.30€ TVA 5.50 % Ht Tva 47.68 2.62 Ouvert de 08H30 à 19h30 Non-Stop du lundi au samedi. Aucun remboursement ou echange sans ticket de ca isse et sur la librairie. 28/06/2024 18:21 TO00007658707*
// with "\n"

SATORIZ COMBOIRE
Chers clientes et clients
Le magasin est ouvert de 8h30 à19h30 Merci de vo
tre compréhention
12 Rue des Montagnes de Lans
38 130 ECHIROLLES
04.76.40.42 34
Impression sur Caisse N : 1
Date Impression : 27/06/2024 a 18:17:34
Ticket de la caisse N : 1
Ticket: 7658707
Date Ticket: 27/06/2024 18:15:37
Serveur : ENORA
Ticket ENCAISSER
QTE
PU €
PP €
0.464 kg KA RE FOURRE CH
10.70
4.96
0.476 kg KA RE FOURRE CH
10.70
5.09
0.238 kg LEVURE MALTEE V
29.95
7.13
0.216 kg RAISINS SULTANI
7.20
1.56
1 un POIS CHICHES 24
1.15
1.15
1 un TOMATES CONCASS
1.05
1.05
1
un POIS CHICHES 24
1.15
1.15
1
un TOMATES PELEES
1.05
1.05
1
un MAYONNAISE NATU
4.55
4.55
1
un MOUTARDE DIJON
1.80
1.80
(Remise de -14.29%)
1 un MOUTARDE ANCIEN 1.80 1.80
(Remise de -14.29%)
1 un LAIT COCO 17% 2 1.05
1.05
0.422 kg COURGETTE
3.60
1.52
0.262 kg AUBERGINE
4.50
1.18
0.496 kg SUCRE CANNE ROU
2.25
1.12
0.788 kg RIZ BASMATI BLA
3.85
3.03
0.210 kg ABRICOTS BRUNS
17.15
3.60
0.138 kg MANGUES vrac
17.95
2.48
0.328 kg KAOKA PEPITES 6
15.35
5.03
Client N 1
50.30
Total Ttc
50.30€
Carte Bancaire
50.30€
TVA
5.50 %
Ht
Tva
47.68
2.62
Ouvert de 08H30 à 19h30 Non-Stop
du lundi au samedi.
Aucun remboursement ou echange sans ticket de ca
isse et sur la librairie.
28/06/2024 18:21
TO00007658707*
raphodn commented 2 days ago

What's missing is having a structured response...

Google Cloud has Document AI : https://cloud.google.com/document-ai/docs/ce-mechanisms ?

Also read the following article, but the parsing is done manually / tedious https://betterprogramming.pub/google-vision-and-google-sheets-api-line-by-line-receipt-parsing-2e2661261cda

teolemon commented 2 days ago

Looking at a tutorial from Google: b. Follow the same and create a custom processor that can process the invoice from your nearest supermarket that you visit most often (Document AI custom processors can support a [wide range of languages](https://cloud.google.com/document-ai/docs/languages)). On average, we have observed ~15 receipts to train and to test each supermarket receipt model.

raphodn commented 2 days ago

Let's talk about it during next wednesday's meeting :)

But Document AI seems quite easy to setup, and we could quite easily start with manually labeling the receipts for our top 20 or 50 stores. Of course I would rather have a in-house re-usable solution, but it could be an interesting experiment nevertheless