kba / hocr-spec

The hOCR Embedded OCR Workflow and Output Format
http://kba.github.io/hocr-spec/1.2/
72 stars 20 forks source link

Classification of properties #94

Open kba opened 7 years ago

kba commented 7 years ago

https://github.com/kba/hocr-spec/pull/84#issuecomment-256556599:

I am a little skeptical that the classifictions for the properties are useful. Maybe, we should rather try to indicate the elements on which this property can be used?

kba commented 7 years ago

Originally, the spec listed the properties under the category of elements. That led to duplication (e.g. ocr_separator being in floats and typesetting). Now, they are grouped in those categories but can be listed in other categories as well. The list is just everything I could think of, but could be reduced. It makes sense IMHO to be able to say: "ocr_line/ocrx_line can contain any inline properties"

kba commented 7 years ago

Maybe, we should rather try to indicate the elements on which this property can be used?

You get these if you click on the dfn in the heading for a property. From the perspective of a hOCR processor, it makes more sense to iterate the elements and parse the properties according to the element definition IMHO rather than the other way around.