Closed metouitude closed 2 weeks ago
Have you tried to run Tesseract with, say, lang = ara+eng?
Have you tried to run Tesseract with, say, lang = ara+eng?
Actually i have found better from chatgpt, With apt install tesseract-ocr-all this installs all the languages so now i can detect multiple languages withing a single image
Your Feature Request
Hello,
I'm currently working on a personal project that involves multiple languages detection, and the furthest i got is :
osd = pytesseract.image_to_osd(self.img)
script = re.search("Script: ([a-zA-Z]+)\n", osd).group(1)
conf = re.search("Script confidence: (\d+\.?(\d+)?)", osd).group(1)
Which is directly taken to be honest from https://stackoverflow.com/questions/70198974/how-to-detect-language-or-script-from-an-input-image-using-python-or-tesseract-o
so for example let's say we have an image with 2 or + languages like this one for example :
In this case OSD will only detect Latin with a confidence of 2.22
but at the same time
pytesseract.image_to_boxes(self.img,lang="ara")
is returning an arabic text,My point is :