jacklicn / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

Meta tags not closed in hocr output #565

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. call tesseract with +var_file.txt with s tring "tessedit_create_hocr 1" 
inside

What is the expected output? What do you see instead?
I expect
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta name='ocr-system' content='tesseract' />
or 
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"></meta>
<meta name='ocr-system' content='tesseract'></meta>

but I see

<meta http-equiv="Content-Type" content="text/html;charset=utf-8" >
<meta name='ocr-system' content='tesseract'>

What version of the product are you using? On what operating system?
3.01 from binaries, Windows XP

Please provide any additional information below.

Original issue reported on code.google.com by dvp...@gmail.com on 28 Oct 2011 at 4:58

GoogleCodeExporter commented 9 years ago
I can not reproduce this. I run:
tesseract eurotext.tif eurotext hocr

and I got:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title></title>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta name='ocr-system' content='tesseract'/>
</head>
...

Original comment by zde...@gmail.com on 28 Nov 2011 at 8:39