sirfz / tesserocr

A Python wrapper for the tesseract-ocr API
MIT License
1.99k stars 255 forks source link

Problem with GetRegions #228

Open marcotn opened 4 years ago

marcotn commented 4 years ago

I am running a piece of code like this:

with version 4.1.1

from tesserocr import PyTessBaseAPI, PSM
api = PyTessBaseAPI(psm=PSM.AUTO_OSD)
api_region = PyTessBaseAPI(psm=PSM.AUTO_OSD)

def image_ocr_boxes(img):
    api.Clear()
    print(api.Version())
    image_ram = Image.open(img)
    api.SetImage(image_ram)
    api.Recognize()
    for region in api.GetRegions():
        api_region.SetImage(region[0])
        api_region.Recognize()
        text = api_region.GetUTF8Text()
        region[0].save(f"boxes/img_{counter}.jpg")
        api_region.Clear()

I wrote this to try to save the image of each region to try to understand why the text contained in a region was "kinda cropped".

Saving an image out of each region with region[0].save() I actually see the images saved are cropped at least they look much smaller from the box I find in the region tuple

I have a feeling that there is a problem with coordinates, in one case they are saved as (x,y,w,h) but Image expects something different.

Anybody else having the same problem problem ?

sirfz commented 4 years ago

You're better off posting this on StackOverflow to get help with the tesseract API or its behavior. I'll keep the issue open for the time being for visibility.

bertsky commented 3 years ago
api = PyTessBaseAPI(psm=PSM.AUTO_OSD)
api_region = PyTessBaseAPI(psm=PSM.AUTO_OSD)

At this point you have initialized two independent instances of Tesseract, which both loaded the default lang='eng' LSTM model. (At least one model is needed, even for segmentation.)

image_ram = Image.open(img)
    api.SetImage(image_ram)

If you have image files anyway, you can skip the Pillow step and just use api.SetImageFile directly (which is based on Leptonica's own pix image format).

for region in api.GetRegions():
        api_region.SetImage(region[0])

That's quite a unique pattern you have invented here! So you make the api Tesseract instance give you PIL.Image / bbox tuples, the former of which you then pass on to the api_region Tesseract instance for recognition.

I don't fully grasp why you came up with that, but there are a couple of issues here: