Open cometta opened 1 month ago
I don't think TrOcr
for that. All you need is a utility function with opencv
or some other preferred library, like this:
import cv2
def crop_images_from_bounding_boxes(image, bounding_boxes):
"""
Crops images from the original image based on the provided bounding boxes.
Parameters:
image (numpy.ndarray): The original image.
bounding_boxes (list of tuples): A list of bounding boxes, where each bounding box is represented by a tuple
(x, y, width, height).
Returns:
list of numpy.ndarray: A list of cropped images.
"""
cropped_images = []
for (x, y, w, h) in bounding_boxes:
cropped_image = image[y:y+h, x:x+w]
cropped_images.append(cropped_image)
return cropped_images
If you are worried about the performance, be assured that as the images are cropped so it will be faster, at least a bit. And you can you use multi-threading for this, if you don't have a GPU. If you have a GPU, this will be fast as hell.
I hope this answers your question.
I want to know if it's possible to input multiple bounding boxes and have TrOCR perform OCR only on those specified areas of my image. Could you please advise on this?