sirfz / tesserocr

A Python wrapper for the tesseract-ocr API
MIT License
2.02k stars 255 forks source link

How do you assign ocr output to variable in for loop #285

Closed ParnoldAlmer closed 3 years ago

ParnoldAlmer commented 3 years ago

A method for using tesserocr with mss and NumPy array with SetImageBytes

        images = [im_datapoint1, im_datapoint2, im_datapoint3, im_datapoint4]
        ocr_data = [datapoint1, datapoint2, datapoint3, datapoint4]

        with PyTessBaseAPI(psm=6) as api:
            for (img, ocrd) in zip(images, ocr_data):
                api.SetImageBytes(img.tobytes(), img.shape[1], img.shape[0], 1, img.shape[1])
                ocrd = api.GetUTF8Text()

Working on my first big python project and really stuck here, any advice? I'm trying to set the a variable to ocr output. When I run the code, the ocr works but it doesn't assign to the variable. I'm sure it works as intended, the main issue is I'm still learning but I've spent hours googling this and asking for help on discord. This seems to be more advanced. There's not a lot of examples out there for tesserocr, I couldn't find anything for my case. Trying to do real time OCR on 5 variables that change.

ParnoldAlmer commented 3 years ago

Now I understand print(result) prints a dict of the result, I just need to call that and manipulate the dict, hopefully.


    with PyTessBaseAPI(psm=6) as api:
        api.SetImageBytes(img.tobytes(), img.shape[1], img.shape[0], 1, img.shape[1])
        return api.GetUTF8Text()```

     with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
            future_to_idx = {executor.submit(ocr_img, img, ocrd): idx for idx, (img, ocrd) in enumerate(zip(images, ocr_data))}
            result = {}
            for future in concurrent.futures.as_completed(future_to_idx):
                idx = future_to_idx[future]
                try:
                    result[idx] = future.result()
                    ocr_data.append(idx)
                except Exception as e:
                    print(f'pair {idx} generated an exception: {e}')

print(result) 
ParnoldAlmer commented 3 years ago

I got it work, if you want to see a working example https://github.com/drksun/ocr-gspro-interface/blob/main/ocr6.py

it stores the data in a dict and you use result.get(key) to get the contents