renard314 / textfairy

Android OCR App
794 stars 293 forks source link

Autodetect bills and invoices and allow to share the extracted data with an expense manager app #37

Open renard314 opened 10 years ago

devingfx commented 3 years ago

Yes! I would love that... Actually I'm testing some tesseract apps to get the better base to try to get the source code to give me a hOCR version of the found text... Because the OCR is working quite well but the text restitution is really bad for bills, as the white spaces have to be kept, but generally OCR apps just concatenate the found texts from tesseract's boxes without keeping the position information so the "paragraph" mess for bills is not easily usable. With a hOCR format restitution, it would be possible for an external app to position the text back to get lines and columns of a bill or any "not book page" formating in fact...