empty page -> broken hOCR

What steps will reproduce the problem?
Run Tesseract with hOCR output on a fully white page.

What is the expected output? What do you see instead?
I would expect output to be a valid HTML document. This is not the case:
# Error Line 10, Column 7: end tag for element "SPAN" which is not open
# Error Line 11, Column 4: end tag for element "P" which is not open
# Error Line 13, Column 6: end tag for element "DIV" which is not open

What version of the product are you using? On what operating system?
Tesseract 3.00, Linux.

Original issue reported on code.google.com by jwilk@jwilk.net on 10 Nov 2010 at 11:39

Attachments:

tmp.html

qixiaobo / tesseract-ocr

empty page -> broken hOCR #401