kba / hocr-spec

The hOCR Embedded OCR Workflow and Output Format
http://kba.github.io/hocr-spec/1.2/
72 stars 20 forks source link

Relate to related formats #89

Open kba opened 7 years ago

kba commented 7 years ago

Like ALTO, Abbyy, Page.

Syntactically but more important: In scope/purpose.

Well, being able to represent engine-specific and intermediate information is the main point of hOCR. Having a uniform representation of OCR output is the point of ALTO. They are two different specs for two different purposes with two different use cases. https://github.com/kba/hocr-spec/issues/17#issuecomment-256131662

See also:

kba commented 7 years ago

See also https://github.com/tmbdev/ocropy/issues/134

kba commented 7 years ago

'OCRopus File Formats' - Google Docs https://github.com/tmbdev/ocropy/issues/126