kba / hocr-spec

The hOCR Embedded OCR Workflow and Output Format
http://kba.github.io/hocr-spec/1.2/
72 stars 20 forks source link

Reusing HTML: Some text is missing in the first paragraph #96

Open amitdo opened 7 years ago

amitdo commented 7 years ago

Reusing HTML

This document describes a representation of various aspects of OCR output in an XML-like format. That is, we define as set of tags containing text and other tags, together with attributes of those tags. However, since the content we are representing is formatted text,

However, we are not actually using a new XML for the representation; instead embed the representation in

Some text is missing in the first paragraph.

Also: define as set => define a set instead embed => instead we embed

kba commented 7 years ago

Some text is missing in the first paragraph

That's still from the google doc. I'm open to suggestions and I'll mark it in the spec.

The typos I'll fix right away, thanks.