sirfz / tesserocr

A Python wrapper for the tesseract-ocr API
MIT License
2.01k stars 253 forks source link

RuntimeError : Error Reading Image #234

Closed sankha90 closed 4 years ago

sankha90 commented 4 years ago

I am trying to setImage using tesserocr: using this -

with PyTessBaseAPI() as api:
for file in filename:
api.Init(lang = 'eng')
api.SetImageFile(file)
#print (api.AllWordConfidences())
arr = list(api.AllWordConfidences())
sumarr = sum(arr) / float(len(arr))

The error I am getting :

Traceback (most recent call last):
File "", line 4, in
api.SetImageFile(file)
File "tesserocr.pyx", line 1597, in tesserocr._tesserocr.PyTessBaseAPI.SetImageFile
RuntimeError: Error reading image

And my version:

print(tesserocr.tesseract_version())
print(tesserocr.get_languages())
tesseract 4.0.0
leptonica-1.76.0 (Jan 8 2019, 13:34:23) [MSC v.1900 LIB Release x64]
libgif 5.1.4 : libjpeg 9b : libpng 1.6.35 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
('C:\Users\subhr\anaconda3\envs\ocr\/tessdata/', ['afr', 'amh', 'ara', 'asm', 'aze', 'aze_cyrl', 'bel', 'ben', 'bod', 'bos', 'bre', 'bul', 'cat', 'ceb', 'ces', 'chi_sim', 'chi_sim_vert', 'chi_tra', 'chi_tra_vert', 'chr', 'cos', 'cym', 'dan', 'deu', 'div', 'dzo', 'ell', 'eng', 'enm', 'epo', 'equ', 'est', 'eus', 'fao', 'fas', 'fil', 'fin', 'fra', 'frk', 'frm', 'fry', 'gla', 'gle', 'glg', 'grc', 'guj', 'hat', 'heb', 'hin', 'hrv', 'hun', 'hye', 'iku', 'ind', 'isl', 'ita', 'ita_old', 'jav', 'jpn', 'jpn_vert', 'kan', 'kat', 'kat_old', 'kaz', 'khm', 'kir', 'kmr', 'kor', 'lao', 'lat', 'lav', 'lit', 'ltz', 'mal', 'mar', 'mkd', 'mlt', 'mon', 'mri', 'msa', 'mya', 'nep', 'nld', 'nor', 'oci', 'ori', 'osd', 'pan', 'pol', 'por', 'pus', 'que', 'ron', 'rus', 'san', 'script/Arabic', 'script/Armenian', 'script/Bengali', 'script/Canadian_Aboriginal', 'script/Cherokee', 'script/Cyrillic', 'script/Devanagari', 'script/Ethiopic', 'script/Fraktur', 'script/Georgian', 'script/Greek', 'script/Gujarati', 'script/Gurmukhi', 'script/HanS', 'script/HanS_vert', 'script/HanT', 'script/HanT_vert', 'script/Hangul', 'script/Hangul_vert', 'script/Hebrew', 'script/Japanese', 'script/Japanese_vert', 'script/Kannada', 'script/Khmer', 'script/Lao', 'script/Latin', 'script/Malayalam', 'script/Myanmar', 'script/Oriya', 'script/Sinhala', 'script/Syriac', 'script/Tamil', 'script/Telugu', 'script/Thaana', 'script/Thai', 'script/Tibetan', 'script/Vietnamese', 'sin', 'slk', 'slv', 'snd', 'spa', 'spa_old', 'sqi', 'srp', 'srp_latn', 'sun', 'swa', 'swe', 'syr', 'tam', 'tat', 'tel', 'tgk', 'tha', 'tir', 'ton', 'tur', 'uig', 'ukr', 'urd', 'uzb', 'uzb_cyrl', 'vie', 'yid', 'yor'])
kba commented 4 years ago

Is the image corrupt perhaps? Can you do

from PIL import Image
Image.open('/path/to/image.png')

in python and

identify path/to/image.tif

in bash and post the output?

Also, you're better off running tesserocr in Linux, e.g. in WSL if you're on Win10.

sankha90 commented 4 years ago

Is the image corrupt perhaps? Can you do

from PIL import Image
Image.open('/path/to/image.png')

in python and

identify path/to/image.tif

in bash and post the output?

Also, you're better off running tesserocr in Linux, e.g. in WSL if you're on Win10.

@kba you pointed it correctly , there was a problem with the image. I have changed the input path and its working now 👍