Open kba opened 7 years ago
It's not clear whether Tom really wanted both ocr_cinfo
and ocrx_cinfo
.
IIUC:
ocr_cinfo
is a generalized form of word, a placeholder to set character cuts.ocrx_word
is a "word". Quotation marks!, since the engine defines what it means with "word".ocrx_cinfo
would then be something within a line that has distinguishable codepoints that is not a "word" in the sense of the engine.x_bboxes
is only mentioned for ocrx_cinfo
cuts
is only mentioned for ocr_cinfo
Since neither ocr_cinfo
nor ocrx_cinfo
seem to have semantics beyond "can contain character-level coordinates", ocrx_cinfo
seems redundant.
Related, should be rebased: https://github.com/kba/hocr-spec/commit/6fdbbbf28
Spec says
but not what
ocrx_cinfo
actually is.