I would like to propose implementing a vectorized bounding box cropping layer that would accept an image (or batch of images) and a tensor of boxes, then return a ragged tensor of cropped bounding boxes from the original image [batch, n_boxes, none, none, 3] (since each crop will have a different h/w).
Keras_CV already has cropping functions for preprocessing/augmentation. But there are no layers to efficiently crop multiple bounding box/s from an image. This functionality is a requirement to build 2-stage detectors for high-resolution images where the graph looks something like this:
2 Preprocessing: resize/rescale image for object detector
3 Object Detection: yolo/Faster/etc.
4 Cropping Layer: THIS IS WHERE THE MAGIC HAPPENS Using the bounding boxes from the object detector, crop objects from the original, full-resolution image in step 1 and resize/pad them for the classifier.
5 Classifier: Resnet, EfficientNet, etc.
6 Post Processing: Format tensors for output
Step 4 from the outline above depends on a robust bounding box cropping layer. The closest implementation I have found is TensorFlow's tf.image.crop_and_resize. The only draw-back to tf.image.crop_and_resize is the resizing step does not preserve the aspect ratio. However, keras_cv.layers.Resizing seems to have some pretty robust resizing options and accepts ragged tensors.
Due to limitations in Pytorch, I have to use a for loop to crop the bounding boxes, and in TensorFlow's tf.image.crop_and_resize the resizing options are limited. This is an opportunity for Keras to offer a functionality that is lacking in other frameworks but needed to build a specific class of models.
@Michael-Blackwell Thanks for filing the issue. This is a good custom layer for users. We dont have enough request for this feature yet. I will be closing the issue.
Short Description
Hello!
I would like to propose implementing a vectorized bounding box cropping layer that would accept an image (or batch of images) and a tensor of boxes, then return a ragged tensor of cropped bounding boxes from the original image [batch, n_boxes, none, none, 3] (since each crop will have a different h/w).
Keras_CV already has cropping functions for preprocessing/augmentation. But there are no layers to efficiently crop multiple bounding box/s from an image. This functionality is a requirement to build 2-stage detectors for high-resolution images where the graph looks something like this:
Step 4 from the outline above depends on a robust bounding box cropping layer. The closest implementation I have found is TensorFlow's tf.image.crop_and_resize. The only draw-back to tf.image.crop_and_resize is the resizing step does not preserve the aspect ratio. However, keras_cv.layers.Resizing seems to have some pretty robust resizing options and accepts ragged tensors.
Due to limitations in Pytorch, I have to use a for loop to crop the bounding boxes, and in TensorFlow's tf.image.crop_and_resize the resizing options are limited. This is an opportunity for Keras to offer a functionality that is lacking in other frameworks but needed to build a specific class of models.
Papers
Multi-Stage-CV-Detection
Existing Implementations
The best implementation I could find is tf.image.crop_and_resize, but again, the resizing options are limited.
Other Information