kba / hocr-spec

The hOCR Embedded OCR Workflow and Output Format
http://kba.github.io/hocr-spec/1.2/
72 stars 20 forks source link

Link differently to classes, title properties, capabilities #53

Closed kba closed 7 years ago

kba commented 7 years ago

So far, we're using the auto-generated id= attributes of the headings for internal linking, which is great for linking to "non-normative" sections but not ideal when referring to specific terms like classes (see #46).

We're using bikeshed for building the HTML spec, which is used by both WHATWG and W3C for various specs like CSS and HTML, so it comes with a lot of shortcuts for defining and linking to various types of terms, values, interfaces etc. To make use of this, we need to map terminologies:

To reference terms within the specs, I'd propose this scheme:

hOCR terminology CSS terminology Definition Link
class element <dfn element>ocr_page</dfn> <{ocr_page}>
title property property <dfn property>bbox</dfn> 'bbox'
metadata property <dfn property>ocr-system</dfn> 'ocr-system'
capability property value <dfn for="ocr-capabilities">ocrp_lang</dfn> ''ocr-capabilities/ocrp_lang''

See http://kba.github.io/test/hocr-relink/ for a preview. Notice the list of terms and the backreferences if you click e.g. on http://kba.github.io/test/hocr-relink/#bbox.

amitdo commented 7 years ago

Nice. :)

I think a section should not reference itself (See ocr_carea).

kba commented 7 years ago

I think a section should not reference itself (See ocr_carea).

It's not, it's referring to the element which is defined in the section heading. For consistency I would leave it as it is.

I'll go ahead and merge this now to avoid conflicts, since the changes are all over the text. I'll integrate the classication above into the text.