Closed bclyc closed 7 years ago
from pprint import pprint
import pyocr
t = pyocr.get_available_tools()[0] # pyocr.tesseract
pprint(t.get_available_languages())
# ['chi_sim',
# (...)
# 'chi_tra',
# (...)
# 'deu-frak',
# (...)
# 'afr']
t.image_to_string(img, lang="chi_sim") # works for me
t.image_to_string(img, lang="deu-frak") # works for me too
t.image_to_string(img, lang="chi-sim") # fails
# TesseractError: (1, b'Tesseract Open Source OCR Engine v3.03 with Leptonica\nError opening data file
# /usr/share/tesseract-ocr/tessdata/chi-sim.traineddata\nPlease make sure the TESSDATA_PREFIX
# environment variable is set to the parent directory of your "tessdata" directory.\n
# Failed loading language \'chi-sim\'\nTesseract couldn\'t load any languages!\n
# Could not initialize tesseract.\n')
t.image_to_string(img, lang="deu_frak") # fails
For me, it is not a bug in PyOCR. It is the expected behavior.
I got this error:
Then I tried eng, fra traineddata file and all went well.
And it took me a long time to find out that it was the naming problem. Atfer I changed the filename from "chi-sim.traineddata" to "chi.traineddata" and changed them in programs, all went ok.I guess it's because pyocr have problem reading data file with "-" in its name. However official tesseract doesn't have this issue.
Please fix this, thank you!