Open teolemon opened 2 weeks ago
Store the number of items, and make it available so that we can nudge the user properly
Quick test with Google Cloud Vision :
Input | Output (after replacing "\n" by " ") |
---|---|
SATORIZ COMBOIRE Chers clientes et clients Le magasin est ouvert de 8h30 à19h30 Merci de vo tre compréhention 12 Rue des Montagnes de Lans 38 130 ECHIROLLES 04.76.40.42 34 Impression sur Caisse N : 1 Date Impression : 27/06/2024 a 18:17:34 Ticket de la caisse N : 1 Ticket: 7658707 Date Ticket: 27/06/2024 18:15:37 Serveur : ENORA Ticket ENCAISSER QTE PU € PP € 0.464 kg KA RE FOURRE CH 10.70 4.96 0.476 kg KA RE FOURRE CH 10.70 5.09 0.238 kg LEVURE MALTEE V 29.95 7.13 0.216 kg RAISINS SULTANI 7.20 1.56 1 un POIS CHICHES 24 1.15 1.15 1 un TOMATES CONCASS 1.05 1.05 1 un POIS CHICHES 24 1.15 1.15 1 un TOMATES PELEES 1.05 1.05 1 un MAYONNAISE NATU 4.55 4.55 1 un MOUTARDE DIJON 1.80 1.80 (Remise de -14.29%) 1 un MOUTARDE ANCIEN 1.80 1.80 (Remise de -14.29%) 1 un LAIT COCO 17% 2 1.05 1.05 0.422 kg COURGETTE 3.60 1.52 0.262 kg AUBERGINE 4.50 1.18 0.496 kg SUCRE CANNE ROU 2.25 1.12 0.788 kg RIZ BASMATI BLA 3.85 3.03 0.210 kg ABRICOTS BRUNS 17.15 3.60 0.138 kg MANGUES vrac 17.95 2.48 0.328 kg KAOKA PEPITES 6 15.35 5.03 Client N 1 50.30 Total Ttc 50.30€ Carte Bancaire 50.30€ TVA 5.50 % Ht Tva 47.68 2.62 Ouvert de 08H30 à 19h30 Non-Stop du lundi au samedi. Aucun remboursement ou echange sans ticket de ca isse et sur la librairie. 28/06/2024 18:21 TO00007658707* |
// with "\n"
SATORIZ COMBOIRE
Chers clientes et clients
Le magasin est ouvert de 8h30 à19h30 Merci de vo
tre compréhention
12 Rue des Montagnes de Lans
38 130 ECHIROLLES
04.76.40.42 34
Impression sur Caisse N : 1
Date Impression : 27/06/2024 a 18:17:34
Ticket de la caisse N : 1
Ticket: 7658707
Date Ticket: 27/06/2024 18:15:37
Serveur : ENORA
Ticket ENCAISSER
QTE
PU €
PP €
0.464 kg KA RE FOURRE CH
10.70
4.96
0.476 kg KA RE FOURRE CH
10.70
5.09
0.238 kg LEVURE MALTEE V
29.95
7.13
0.216 kg RAISINS SULTANI
7.20
1.56
1 un POIS CHICHES 24
1.15
1.15
1 un TOMATES CONCASS
1.05
1.05
1
un POIS CHICHES 24
1.15
1.15
1
un TOMATES PELEES
1.05
1.05
1
un MAYONNAISE NATU
4.55
4.55
1
un MOUTARDE DIJON
1.80
1.80
(Remise de -14.29%)
1 un MOUTARDE ANCIEN 1.80 1.80
(Remise de -14.29%)
1 un LAIT COCO 17% 2 1.05
1.05
0.422 kg COURGETTE
3.60
1.52
0.262 kg AUBERGINE
4.50
1.18
0.496 kg SUCRE CANNE ROU
2.25
1.12
0.788 kg RIZ BASMATI BLA
3.85
3.03
0.210 kg ABRICOTS BRUNS
17.15
3.60
0.138 kg MANGUES vrac
17.95
2.48
0.328 kg KAOKA PEPITES 6
15.35
5.03
Client N 1
50.30
Total Ttc
50.30€
Carte Bancaire
50.30€
TVA
5.50 %
Ht
Tva
47.68
2.62
Ouvert de 08H30 à 19h30 Non-Stop
du lundi au samedi.
Aucun remboursement ou echange sans ticket de ca
isse et sur la librairie.
28/06/2024 18:21
TO00007658707*
What's missing is having a structured response...
Google Cloud has Document AI : https://cloud.google.com/document-ai/docs/ce-mechanisms ?
Also read the following article, but the parsing is done manually / tedious https://betterprogramming.pub/google-vision-and-google-sheets-api-line-by-line-receipt-parsing-2e2661261cda
Do we want to go that far ?
Are the items themselves that useful in a first phase? Should we limit to grabbing store info / number of items if easy
Another option would be a drag and drop UI to reorder things ?
Shouldn't we then start adding a "Label on receipt" field to label, we could then use for subsequent purchases from the same store
[ ] Extract SIRET
[ ] Extract phone number
[ ] Extract store name
Looking at a tutorial from Google:
b. Follow the same and create a custom processor that can process the invoice from your nearest supermarket that you visit most often (Document AI custom processors can support a [wide range of languages](https://cloud.google.com/document-ai/docs/languages)). On average, we have observed ~15 receipts to train and to test each supermarket receipt model.
Let's talk about it during next wednesday's meeting :)
But Document AI seems quite easy to setup, and we could quite easily start with manually labeling the receipts for our top 20 or 50 stores. Of course I would rather have a in-house re-usable solution, but it could be an interesting experiment nevertheless
Problem
Links
Why