typless / tesseract-aws-lambda

Project for blog on how to run tesseract on AWS Lambda
9 stars 7 forks source link

Still can't find the Tesseract on Lambda #2

Open Ageneinair opened 4 years ago

Ageneinair commented 4 years ago

Really great article and it's a very clear start for me. Thank you! However, from the AWS console, it shows tesseract still not be installed:

START RequestId: d1674504-5404-4809-a941-63fbf94e96b9 Version: $LATEST
[ERROR] TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information.
Traceback (most recent call last):
  File "/var/task/ocr/handler.py", line 25, in handler
    result = run(image_bytes,oem,psm,only_extract_number)
  File "/var/task/ocr/handler.py", line 55, in run
    ocr_result = ocr(img,oem,psm,only_extract_number)
  File "/var/task/ocr/handler.py", line 48, in ocr
    result = pytesseract.image_to_string(img,config=custom_config)
  File "/opt/python/lib/python3.7/site-packages/pytesseract/pytesseract.py", line 374, in image_to_string
    }[output_type]()
  File "/opt/python/lib/python3.7/site-packages/pytesseract/pytesseract.py", line 373, in <lambda>
    Output.STRING: lambda: run_and_get_output(*args),
  File "/opt/python/lib/python3.7/site-packages/pytesseract/pytesseract.py", line 282, in run_and_get_output
    run_tesseract(**kwargs)
  File "/opt/python/lib/python3.7/site-packages/pytesseract/pytesseract.py", line 254, in run_tesseract
    raise TesseractNotFoundError()
END RequestId: d1674504-5404-4809-a941-63fbf94e96b9
REPORT RequestId: d1674504-5404-4809-a941-63fbf94e96b9  Duration: 1184.76 ms    Billed Duration: 1200 ms    Memory Size: 128 MB Max Memory Used: 77 MB  Init Duration: 388.44 ms    "

I don't know where I did wrong. here is how my layer folder looks like: Screen Shot 2020-09-16 at 12 57 56 PM all related packages are in tesseract-packages

Did I do something wrong for the unzipping path?

jangia commented 4 years ago

It's hard to tell a lot just from the provided image but here are some things I've noticed:

The most important is that layer have structure like in example - inside site-packages should be installed python dependencies(I omit them for clarity).

Your folder structure is quite different - python dependenices must be inside python/lib/python3.7/site-packages/. Otherwise the won't be resolved automatically.

Hope it helps.