pdfliberation / whatwordwhere

Tooling to extract data from scanned paper forms OCR-ed by Tesseract using the HOCR standard.
22 stars 5 forks source link