Closed tangb closed 6 years ago
Since it generated a output.txt
instead of a output.hocr
or output.html
, my guess would be that you're missing the configuration file hocr
(/usr/share/tesseract-ocr/tessdata/configs/hocr
with Tesseract 3.05 in Debian). If so, it shouldn't have been silenced however.
I compiled tesseract on my own and I got a hocr file in /usr/local/share/tessdata/configs/hocr I got this output with --print-parameters command option:
tesseract --print-parameters | grep hocr hocr_font_info 0 Add font info to hocr output tessedit_create_hocr 0 Write .html hOCR output file
0 means disabled ? Is there a way to make sure tesseract uses the hocr specified above?
Thank you for your help
0 = disabled. But this is to be expected since you didn't specify to Tesseract that it must use the hocr configuration file.
% tesseract --print-parameters | grep hocr
hocr_font_info 0 Add font info to hocr output
tessedit_create_hocr 0 Write .html hOCR output file
% tesseract --print-parameters randomfile.jpeg randomoutputfile hocr | grep hocr
hocr_font_info 0 Add font info to hocr output
tessedit_create_hocr 1 Write .html hOCR output file
What is the content of your /usr/local/share/tessdata/configs/hocr
?
I rebuild completely tesseract with latest version and I got no problem. It seems it was an issue with my tessdata path, files were installed in different place...
Thank you for your help ;-)
You're welcome
Hello
I'm trying to use pyocr with tesseract 4.0.0 alpha and I got error about file not found during generation. It works well with TextBuilder and DigitBuilder but fails with LineBoxBuilder, WordBoxBuilder.
I'm running under debian jessie (v8)
Can you help me ? Thank you :smile:
Tesseract infos:
Python code
Content of tmp during process: