microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
19.08k stars 2.43k forks source link

OCR on bounding boxes of an image #1564

Open cometta opened 1 month ago

cometta commented 1 month ago

I want to know if it's possible to input multiple bounding boxes and have TrOCR perform OCR only on those specified areas of my image. Could you please advise on this?

maifeeulasad commented 1 month ago

I don't think TrOcr for that. All you need is a utility function with opencv or some other preferred library, like this:

import cv2

def crop_images_from_bounding_boxes(image, bounding_boxes):
    """
    Crops images from the original image based on the provided bounding boxes.

    Parameters:
    image (numpy.ndarray): The original image.
    bounding_boxes (list of tuples): A list of bounding boxes, where each bounding box is represented by a tuple
                                     (x, y, width, height).

    Returns:
    list of numpy.ndarray: A list of cropped images.
    """
    cropped_images = []

    for (x, y, w, h) in bounding_boxes:
        cropped_image = image[y:y+h, x:x+w]
        cropped_images.append(cropped_image)

    return cropped_images

If you are worried about the performance, be assured that as the images are cropped so it will be faster, at least a bit. And you can you use multi-threading for this, if you don't have a GPU. If you have a GPU, this will be fast as hell.

I hope this answers your question.