ocropus / hocr-tools

Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.
Other
359 stars 78 forks source link

alto input? #155

Closed jtlz2 closed 4 years ago

jtlz2 commented 4 years ago

I am trying to use your excellent tools to compare alto files from ABBYY and tesseract, but I haven't found a reliable way to convert the alto into hocr in order to do so.

Do you have any plans to support alto input?

I have tried to get ocr-fileformat to do the conversion - so far in vain.

Thanks for all help

zuphilip commented 4 years ago

The hocr-tools are tools for working with hocr files. A transformation from ALTO to hocr is out-of-scope here, but the main purpose of ocr-fileformat. This transformation should already been supported by ocr-fileformat. Let us know there, if you have any problems with that.

jtlz2 commented 4 years ago

Huge thanks and sorry to pollute - see https://github.com/UB-Mannheim/ocr-fileformat/issues/89 where I have described the problem at hand.