houqp / leptess

Productive and safe Rust binding for leptonica and tesseract
https://houqp.github.io/leptess/leptess/index.html
MIT License
258 stars 28 forks source link

Add hocr support #28

Closed ccouzens closed 3 years ago

ccouzens commented 3 years ago

HOCR presumably stands for HTML OCR. It generates HTML of the image, with attributes describing where each word appears in the image.

https://github.com/houqp/leptess/issues/27

Example output (from different project): https://github.com/antimatter15/tesseract-rs/blob/3edc4e7658a63aeefe371091e3133bcc24ec02f6/img.html