madmaze / pytesseract

A Python wrapper for Google Tesseract
Apache License 2.0
5.82k stars 721 forks source link

get_languages #551

Closed Larbo53 closed 4 months ago

Larbo53 commented 4 months ago

Hi,

the print(pytesseract.get_languages(config='')) command returns an empty list. I use python3.9 I deleted pytesseract, restarted my macbook, then reinstalled pytesseract. I still have the same problem. How can I install the list of languages? Thanks for your feedback. Sincerely

stefan6419846 commented 4 months ago

This seems to be a Tesseract issue, not a pytesseract one.

Please verify that Tesseract can find any language files using tesseract --list-langs from the terminal. If this does not yield any languages, please install the language files the same way you installed Tesseract (using the same source usually ensures that the hard-coded data directory is valid) or download them manually and use the TESSDATA_PREFIX environment variable to point to them.

Larbo53 commented 4 months ago

I've just reinstalled tesseract with the pip command and the problem persists. how do i find and install the language file, and in which directory should it be stored? I'm using python3.9 and macos Monterey v 12.75. Thanks for your help. Sincerely

stefan6419846 commented 4 months ago

pytesseract is just a wrapper around Tesseract, which needs to be installed separately. Please refer to the Tesseract project for further installation instructions: https://github.com/tesseract-ocr/tesseract?tab=readme-ov-file#installing-tesseract

Larbo53 commented 4 months ago

I just found the language files in 'usr/local/bin/tesseract-lang/4.1.0/share/tessdata'. In which file do I have to enter this path for it to work properly? Thanks for your help. Sincerely

stefan6419846 commented 4 months ago

In the best case, your Tesseract installation already picks this up. Otherwise, you have to set the environment variable TESSDATA_PREFIX accordingly - either in your global environment or inside your Python script with os.environ["TESSDATA_PREFIX"] = ....

Larbo53 commented 4 months ago

I've just seen that the Tesseract version is 5.2. Maybe that's where the problem lies. Thank you.

Larbo53 commented 4 months ago

os.environ["TESSDATA_PREFIX"] = 'usr/local/bin/tesseract-lang/4.1.0/share/tessdata/' error message : Failed loading language \'eng\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')

stefan6419846 commented 4 months ago

In this case it seems like the English language data is not available, which AFAIK is always required.

Larbo53 commented 4 months ago

I uninstall tesseract, then reinstall it. Thank you

Larbo53 commented 4 months ago

hi,

by reinstalling everything, tesseract is now operational. Thanks a lot for your help. Best regards.