madmaze / pytesseract

A Python wrapper for Google Tesseract
Apache License 2.0
5.82k stars 721 forks source link

get_languages is not supported in conjugation with --tessdata-dir (config) #553

Closed KaipaUday closed 3 months ago

KaipaUday commented 3 months ago

I have tried to list languages by specifying a --tessdata-dir and pytesseract does not seem to consider that.

PS C:\Users\xxx\Desktop\xxx> python
Python 3.12.3 (tags/v3.12.3:f6650f9, Apr  9 2024, 14:05:25) [MSC v.1938 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from pytesseract.pytesseract import *
>>> get_languages()
['bhu', 'eng', 'lets']
>>> get_languages(config=r"--tessdata-dir C:\Users\xxx\Desktop\xxx\config\tesseract_tessdata")
['bhu', 'eng', 'lets']
>>> get_languages(config=r"--tessdata-dir C:\Users\xxx\Desktop\xxx\config\tesseract_tessdata")
['bhu', 'eng', 'lets']
>>> exit()              

when i try the same in powershell. it works.


PS C:\Users\xxx\Desktop\xxx> tesseract --tessdata-dir C:\Users\KUD5RT\Desktop\xxx\config\tesseract_tessdata --list-langs
List of available languages in "C:\Users\xxx\Desktop\xxx\config\tesseract_tessdata/" (2):
bhu
bhu32
KaipaUday commented 3 months ago

Apparently the below code works.

get_languages(config=r'--tessdata-dir "C:\Users\KUD5RT\Desktop\bhuimageanalyser\config\tesseract_tessdata/" -l bhu --oem 1 --psm 6 -c tessedit_char_whitelist=0123456789.:')