openfoodfacts / openfoodfacts-ai

This is a tracking repo for all our AI projects. 🍕 🤖🍼
222 stars 52 forks source link

Extract data from receipts/bills and enrich OFF with prices notion #65

Open devingfx opened 3 years ago

devingfx commented 3 years ago

I read somewhere that table recognition is on roadmap... When this is ready, scanning "bills" or invoices to extract products price by brand/store/date.

With a shared price information, comparators and other apps would be possibles...

Maybe privacy is to be discussed though ! Maybe a mix of anonymous price data, and a way to keep the pictures/OCR/data local in the user device (aka let apps owners use OCR localy)

My 2 cents

devingfx commented 3 years ago

I developped a draft of bills data extractor from PDF > detect header footer infos (like store, address, date, SIRET, ect) > parse table rows to CSV.

PDF file are generated right now by an external app TextFairy that uses Tesseract to extract text and positioning.

I found OpenFoodFacts searching a way to get products infos from "partial general name" + store

teolemon commented 3 years ago

@devingfx We had made this prototype during a hackathon: https://github.com/openreceipts/openreceipts-server

raphael0202 commented 7 months ago

@devingfx I'm not sure whether you're still interested in the subject, but you've launche Open Prices (https://prices.openfoodfacts.org), a crowdsourced database of prices of food products in the world. Having ML to extract automatically data from receipts/price tags would help tremendously.