pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
15.97k stars 6.92k forks source link

[RFC] Rotated Bounding Boxes #2761

Open oke-aditya opened 3 years ago

oke-aditya commented 3 years ago

🚀 Feature

A bit unsure about this feature.

Support Rotated Bounding Boxes in Torchvision.

Motivation

There is recent research on Rotated Bounding boxes which provides better detection results. I am not able to find highly cited results but a few of them are

  1. Rotation Invariant Detector
  2. SCRDET ICCV 2019.

I'm not sure for more papers. I think this is slightly new topic, and needs a bit more research.

Pitch

  1. Support operations for Rotated Bounding boxes, NMS, ROI Align. IoU.
  2. Support Augmentation (if needed) for Rotated Bounding boxes.
  3. Support such Datasets (if any) that use Rotated Boxes.

If possible we can also support Rotated models. In my opinion it might not be very feasible. Since it will take a lot of time and maintenance for these models.

Alternatives

Currently, Detectron2 has great support for all the above. These operations are implemented in C++.

Additional context

Looks tricky and challenging. I guess it might also be too early to think about this. I think it needs a bit more research and some baseline.

cc @pmeier

fmassa commented 3 years ago

Hi,

This is a good thing to consider adding to torchvision, but as you mentioned it might involve quite a few changes so it won't be easy to do.

For those interested in having support for rotated bounding boxes, please "thumbs up" the main post.

zhiqwang commented 3 years ago

Hi @fmassa , @oke-aditya , I noticed that there are implementations such as nms_rotated , ROIAlignRotated and pairwise_iou_rotated in detectron2. And the author noted that

Note: this function (nms_rotated) might be moved into torchvision/ops/boxes.py in the future

Maybe this is an opportune moment to do this?

cc @ppwwyyxx

oke-aditya commented 3 years ago

Let's discuss this @zhiqwang

There are a few points about this what I think.

  1. Detectron2 has a class for Boxes as well as for rotated bounding boxes. We don't have any such classes in torchvision. This keeps abstraction to Tensor level and makes these ops very easy to use.

For Rotated boxes, we would need to implement these common operations.

Both torchvision and detectron2 represent bounding boxes as (x1, y1, x2, y2) for Non rotated. We can covert them though, but all the operations are implmented for this format only.

Detectron2 represents rotated boxes as (x_center, y_center, width, height, angle). I think this could be standard for torchvision too.

Some proposals for ops

  1. For ops such as box_area, clip_boxes_to_image we can overload the to accept both Tensor[N, 4] for non rotated case and Tensor[N, 5] . This would keep them a bit less in number. Else we would have ops like rotated_box_area, rotated ... Since we don't represnet boxes as classes. Both are fine though

  2. Some ops are done using C++ in Detectron2 like @zhiqwang mentioned. Unsure about them.

  3. We need to provide rotated bounding box conversions. This is tricky again we can resort to 1. overload the Tensor to Tensor[N, 4] for non rotated and Tensor[N, 5] for rotated. Or we can provide rotated_box_convert.

I think Rotated bounding box is a more generic case of non rotated case. Case where angle is 0 represents normal boxes. (Please correct if I'm wrong).

  1. We could provide a simple utility to simply convert a rotated box to non rotated case. (Simply tilt back the angle to 0 and remove that dim)

My thoughts might be naive. I guess this needs better thoughts.

zhiqwang commented 3 years ago

Hi @oke-aditya , the points you highlight and the PR #2710 makes me clear of the implementation of boxes conversion in torchvison.

Both torchvision and detectron2 represent bounding boxes as (x1, y1, x2, y2) for Non rotated. We can covert them though, but all the operations are implmented for this format only.

I think that one reason both torchvision and detectron2 uses the (x1, y1, x2, y2) format to represent the rectangle bounding box is associated to the utilization of COCO datasets. But for rotated bounding boxes tasks, there are not a standard/benchmark datasets now? I have encountered two tasks using rotated bounding boxes. One is the OCR task - ICDAR 2015, and the other one is aerial image detection. These two uses the polygon format (x1, y1, x2, y2, x3, y3, x4, y4) (~but they are rotated boxes actually?~ Edited: I'm wrong here, there are actually using the quadrilateral format) So one must converted the polygon format to (x_center, y_center, width, height, angle) firstly if they want to use detectron2.

And as @vfdev-5 , @pmeier and @oke-aditya mentioned in #2753 :

OCR is a higher level application and thus should be in a separate package which might depend on torchvision.

But if there are operations like nms_rotated and pairwise_iou_rotated implemented in torchvision. It will be convenient for other users.

oke-aditya commented 3 years ago

Let's have more thoughts from @fmassa

sugatoray commented 3 years ago

Displaying bounding boxes for rotated rectangles.

Given the rotated blue rectangular region, how to show the label?

If the labels are rotated as well, it will be quite challenging to rotate your head everytime to read the labels. On the other hand, if the label is shown withing the non-rotated bounding box (green), which also contains the target rotated rectangle (blue), then it is clear where the rotated bounding box is and is also easy to read.

So, each bounding box then is composed of two things:

But, any format specification should be adequate to determine both bounding boxes and draw them as well.

image

ashnair1 commented 2 years ago

Are there plans to port the rotated ops from detectron2 to torchvision?

ashnair1 commented 1 year ago

Any update on this? This would be really beneficial to the remote sensing community since, in RS imagery, objects of various sizes & orientations can be clustered together, and rotated object detectors generally outperform default detectors in those cases. If the rotated ops in detectron2 are ported over, we can write up a Rotated Faster-RCNN for torchvision.

oke-aditya commented 1 year ago

@pmeier I guess with the new Features API this may be possible in near future?

pmeier commented 1 year ago

Yes, in theory we could have features.RotatedBoundingBox with shape (*, 5) where 4 items represent the coordinates and one represents the angle. For now, we already implemented rotate_bounding_box:

https://github.com/pytorch/vision/blob/149edda463b54b3eabe989e260a839727c89d099/torchvision/prototype/transforms/functional/_geometry.py#L573

This actually rotates the coordinates and thus the result is a "proper" (for the lack of a better term) bounding box. Thus, the bounding box may change shape.

Excuse my ignorance here, but is rotation invariance that important? What about all the other transformations that change the shape of bounding boxes, like affine, elastic, perspective, and so on? From my, admittedly limited, perspective it seems weird that rotation is singled out here. Or to put differently, having a features.RotatedBoundingBox in the library would be odd, since it behaves just like features.BoundingBox except for one transformation.

ashnair1 commented 1 year ago

Rotation invariance is indeed very important when doing object detection on remote sensing imagery. But the addition of a RotatedBoundingBox is to better represent the objects in the scene not as a form of augmentation.

Some advantages of RotatedBoxes over regular Boxes:

Below is an example from the DOTA paper comparing predictions from Faster-RCNN vs Rotated Faster-RCNN: obbvshbb2

These same features are why rotated boxes are used for sign detection as well. From a representation perspective, the default BoundingBox is a special case of the RotatedBoundingBox with angle = 0 degrees.

oke-aditya commented 1 year ago

@pmeier We might need to look into Detectron2 for some Inspiration. As I know operations such as NMS and IoUs etc differ for Rotated boxes.

pmeier commented 1 year ago

Maybe I misunderstood the context here: are we talking about rotated bounding boxes at the transforms or at the model stage?

ashnair1 commented 1 year ago

Maybe I misunderstood the context here: are we talking about rotated bounding boxes at the transforms or at the model stage?

At the modelling stage. You would need rotated variants for NMS, IOU calculation, RPNs, ROIAlign etc. Just like in detectron2.

pmeier commented 1 year ago

Ok, in that case the prototype features won't really help here as the models will only work with plain tensors. Basically, in the transform stage the features.RotatedBoundingBox will behave just like the regular features.BoundingBox except for rotate. Apart from that, all models and ops that are not associated with agumentation, will have to be implemented.

I don't know what the priority for this would be. From what I understand is that our roadmap for H2 this year is already full, but I'll let @datumbox chime in here.

oke-aditya commented 1 year ago

Since I had created the issue, my initial idea was that it isn't exactly to do with transforms but we would need a few small transforms like say horizontal flip etc that need both image and the box.

But other than that mostly it's on the ops side.

datumbox commented 1 year ago

This is a bigger piece of work that we should consider after H2. Philip is right to say that our roadmap is currently full and this is quite a sizeable piece of work to squeeze in. As Aditya said, this is not just about adding support for transforms but also for operators. I would be in favour of reviewing the support of this as a whole as we would need to provide Transforms, Ops and Models that are compatible. IMO we should avoid adding more partial support for features in TorchVision; we already have quite a few things that are partially supported (Keypoints, Video Decoding etc). We are currently investing on closing any high priority gaps on the already supported tasks. Once this is done, we should stack rank big proposed features to see which of them are worth investing into. The situation I would like to avoid is having some support of Rotated BBoxes in some Transforms/Ops/Models but not on others. This is likely to make our Detection API more complex and harder to revamp.