tesseract-ocr / tessdata

Trained models with fast variant of the "best" LSTM models + legacy models
Apache License 2.0
6.45k stars 2.2k forks source link

Failed loading language 'eng' #181

Closed Alookima21 closed 8 months ago

Alookima21 commented 8 months ago

I have installed tesseract using brew on my M1Pro, and my current version is:

tesseract 5.3.4
 leptonica-1.84.1
  libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 3.0.0) : libpng 1.6.43 : libtiff 4.6.0 : zlib 1.2.12 : libwebp 1.3.2 : libopenjp2 2.5.2
 Found NEON
 Found libarchive 3.7.2 zlib/1.2.12 liblzma/5.4.4 bz2lib/1.0.8 liblz4/1.9.4 libzstd/1.5.5
 Found libcurl/8.4.0 SecureTransport (LibreSSL/3.3.6) zlib/1.2.12 nghttp2/1.55.1

I have checked and verified that eng.traineddata has been installed as well, and the path is correct. However on running tesseract on my code I get the error:

Error: Tesseract (legacy) engine requested, but components are not present in /opt/homebrew/share/tessdata/eng.traineddata!! Failed loading language 'eng' Tesseract couldn't load any languages! Could not initialize tesseract.

I have uninstalled and reinstalled tesseract and have even tried manually downloading the eng model from the github repo, making a custom path and using that, but it throws the same error. Was hoping someone could help with this.

stweil commented 8 months ago

Please use the Tesseract user forum for questions.

Typically this error occurs because users did not load eng.traineddata, but the web page for that file. Try to open /opt/homebrew/share/tessdata/eng.traineddata in a text editor. Is it HTML code? Then that's the reason.

stweil commented 8 months ago

Ah, sorry, I (and you) should have read the error message. "Tesseract (legacy) engine requested, but components are not present" gives the reason for the failure. You have installed a fast model which includes a neural network for the LSTM engine, but which does not support the legacy OCR engine.