madmaze / pytesseract

A Python wrapper for Google Tesseract
Apache License 2.0
5.81k stars 719 forks source link

PermissionError: [WinError 5] Access is denied #282

Closed kethan1 closed 4 years ago

kethan1 commented 4 years ago

Hi, I am trying to run the sample code provided. I have installed tesseract from google. It is not in path, so I specified it in pytesseract.pytesseract.tesseract_cmd.

OS: Windows 10 Python: Python 3.8.3 Tesseract Installation Location: C:\Users\ketha\AppData\Local\Tesseract-OCR Error:

Traceback (most recent call last):
  File "C:\Users\ketha\Downloads\pytesseract-master\tests\data\test.py", line 12, in <module>
    print(pytesseract.image_to_string(Image.open('test.png')))
  File "C:\Users\ketha\AppData\Local\Programs\Python\Python38\lib\site-packages\pytesseract\pytesseract.py", line 356, in image_to_string
    return {
  File "C:\Users\ketha\AppData\Local\Programs\Python\Python38\lib\site-packages\pytesseract\pytesseract.py", line 359, in <lambda>
    Output.STRING: lambda: run_and_get_output(*args),
  File "C:\Users\ketha\AppData\Local\Programs\Python\Python38\lib\site-packages\pytesseract\pytesseract.py", line 270, in run_and_get_output
    run_tesseract(**kwargs)
  File "C:\Users\ketha\AppData\Local\Programs\Python\Python38\lib\site-packages\pytesseract\pytesseract.py", line 241, in run_tesseract
    raise e
  File "C:\Users\ketha\AppData\Local\Programs\Python\Python38\lib\site-packages\pytesseract\pytesseract.py", line 238, in run_tesseract
    proc = subprocess.Popen(cmd_args, **subprocess_args())
  File "C:\Users\ketha\AppData\Local\Programs\Python\Python38\lib\subprocess.py", line 854, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Users\ketha\AppData\Local\Programs\Python\Python38\lib\subprocess.py", line 1307, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
PermissionError: [WinError 5] Access is denied

Code:

try:
    from PIL import Image
except ImportError:
    import Image
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r'C:/Users/ketha/AppData/Local/Tesseract-OCR'

print(pytesseract.image_to_string(Image.open('test.png')))

print(pytesseract.image_to_string(Image.open('test-european.jpg'), lang='fra'))

print(pytesseract.image_to_string('test.png'))

print(pytesseract.image_to_string('images.txt'))

try:
    print(pytesseract.image_to_string('test.jpg', timeout=2)) # Timeout after 2 seconds
    print(pytesseract.image_to_string('test.jpg', timeout=0.5)) # Timeout after half a second
except RuntimeError as timeout_error:
    pass

print(pytesseract.image_to_boxes(Image.open('test.png')))

print(pytesseract.image_to_data(Image.open('test.png')))

print(pytesseract.image_to_osd(Image.open('test.png')))

pdf = pytesseract.image_to_pdf_or_hocr('test.png', extension='pdf')
with open('test.pdf', 'w+b') as f:
    f.write(pdf) # pdf type is bytes by default

hocr = pytesseract.image_to_pdf_or_hocr('test.png', extension='hocr')
bozhodimitrov commented 4 years ago

Hi @kethan1 and thank you for reporting the issue. Can you share the Tesseract-OCR installer link that you used?

Also: you should use the full path to the executable tesseract binary -- example:

pytesseract.pytesseract.tesseract_cmd = r'C:/Users/ketha/AppData/Local/Tesseract-OCR/tesseract'
kethan1 commented 4 years ago

Hi, I have changed the path, and now I am getting a different error when on this line: print(pytesseract.image_to_string(Image.open('test-european.jpg'), lang='fra')). The full installer link is: https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-w32-setup-v5.0.0-alpha.20200328.exe

The error I am getting is this:

Traceback (most recent call last):
  File "C:\Users\ketha\Downloads\pytesseract-master\tests\data\test.py", line 17, in <module>
    print(pytesseract.image_to_string(Image.open('test-european.jpg'), lang='fra'))
  File "C:\Users\ketha\AppData\Local\Programs\Python\Python38\lib\site-packages\pytesseract\pytesseract.py", line 356, in image_to_string
    return {
  File "C:\Users\ketha\AppData\Local\Programs\Python\Python38\lib\site-packages\pytesseract\pytesseract.py", line 359, in <lambda>
    Output.STRING: lambda: run_and_get_output(*args),
  File "C:\Users\ketha\AppData\Local\Programs\Python\Python38\lib\site-packages\pytesseract\pytesseract.py", line 270, in run_and_get_output
    run_tesseract(**kwargs)
  File "C:\Users\ketha\AppData\Local\Programs\Python\Python38\lib\site-packages\pytesseract\pytesseract.py", line 246, in run_tesseract
    raise TesseractError(proc.returncode, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (1, 'Error opening data file C:\\Users\\ketha\\AppData\\Local\\Tesseract-OCR/tessdata/fra.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'fra\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')
bozhodimitrov commented 4 years ago

This one is not a pytesseract related problem. It seems that you didn't install the French tessdata language files together with Tesseract itself. The installer that you linked above, by default installs only English language data.

kethan1 commented 4 years ago

Can you give me the installer link for all the languages for Windows 10.

bozhodimitrov commented 4 years ago

The installer that you use has this option as a list of languages for installing, so you can just check the box for the languages that you need.

kethan1 commented 4 years ago

Oh, okay, thank you so much for your help. Thanks for building the module!!