madmaze / pytesseract

A Python wrapper for Google Tesseract
Apache License 2.0
5.82k stars 721 forks source link

get_languages() does not return languages that have numbers in the name. #552

Open KaipaUday opened 3 months ago

KaipaUday commented 3 months ago

Hi, There is strange issue, when use the get_languages(), it only returns some languages.


>>> from pytesseract.pytesseract import *
>>> get_languages()                         
['bhu', 'eng', 'lets']

when i try in cmd or powershell.

PS C:\Users\xxxxT\Desktop\XXXX> tesseract --list-langs
List of available languages in "C:\Users\xxxxT\AppData\Local\Programs\Tesseract-OCR/tessdata/" (6):
7seg
bhu
bhu32_7seg
bhu32_eng
eng
lets
stefan6419846 commented 3 months ago

These are currently being filtered, introduced in #312: https://github.com/madmaze/pytesseract/blob/c8359454aabb47eeb0360f7017ba58a786366a4e/pytesseract/pytesseract.py#L51