OCR on bounding boxes of an image

I don't think TrOcr for that. All you need is a utility function with opencv or some other preferred library, like this:

import cv2

def crop_images_from_bounding_boxes(image, bounding_boxes):
    """
    Crops images from the original image based on the provided bounding boxes.

    Parameters:
    image (numpy.ndarray): The original image.
    bounding_boxes (list of tuples): A list of bounding boxes, where each bounding box is represented by a tuple
                                     (x, y, width, height).

    Returns:
    list of numpy.ndarray: A list of cropped images.
    """
    cropped_images = []

    for (x, y, w, h) in bounding_boxes:
        cropped_image = image[y:y+h, x:x+w]
        cropped_images.append(cropped_image)

    return cropped_images

If you are worried about the performance, be assured that as the images are cropped so it will be faster, at least a bit. And you can you use multi-threading for this, if you don't have a GPU. If you have a GPU, this will be fast as hell.

I hope this answers your question.

microsoft / unilm

OCR on bounding boxes of an image #1564