Open devingfx opened 3 years ago
I developped a draft of bills data extractor from PDF > detect header footer infos (like store, address, date, SIRET, ect) > parse table rows to CSV.
PDF file are generated right now by an external app TextFairy that uses Tesseract to extract text and positioning.
I found OpenFoodFacts searching a way to get products infos from "partial general name" + store
@devingfx We had made this prototype during a hackathon: https://github.com/openreceipts/openreceipts-server
@devingfx I'm not sure whether you're still interested in the subject, but you've launche Open Prices (https://prices.openfoodfacts.org), a crowdsourced database of prices of food products in the world. Having ML to extract automatically data from receipts/price tags would help tremendously.
I read somewhere that table recognition is on roadmap... When this is ready, scanning "bills" or invoices to extract products price by brand/store/date.
With a shared price information, comparators and other apps would be possibles...
Maybe privacy is to be discussed though ! Maybe a mix of anonymous price data, and a way to keep the pictures/OCR/data local in the user device (aka let apps owners use OCR localy)
My 2 cents