Open oke-aditya opened 4 years ago
Hi,
This is a good thing to consider adding to torchvision, but as you mentioned it might involve quite a few changes so it won't be easy to do.
For those interested in having support for rotated bounding boxes, please "thumbs up" the main post.
Hi @fmassa , @oke-aditya , I noticed that there are implementations such as nms_rotated
, ROIAlignRotated
and pairwise_iou_rotated
in detectron2. And the author noted that
Note: this function (nms_rotated) might be moved into torchvision/ops/boxes.py in the future
Maybe this is an opportune moment to do this?
cc @ppwwyyxx
Let's discuss this @zhiqwang
There are a few points about this what I think.
For Rotated boxes, we would need to implement these common operations.
Both torchvision and detectron2 represent bounding boxes as (x1, y1, x2, y2)
for Non rotated. We can covert them though, but all the operations are implmented for this format only.
Detectron2 represents rotated boxes as (x_center, y_center, width, height, angle)
.
I think this could be standard for torchvision too.
Some proposals for ops
For ops such as box_area
, clip_boxes_to_image
we can overload the to accept both Tensor[N, 4]
for non rotated case and Tensor[N, 5]
. This would keep them a bit less in number. Else we would have ops like rotated_box_area
, rotated ..
. Since we don't represnet boxes as classes. Both are fine though
Some ops are done using C++ in Detectron2 like @zhiqwang mentioned. Unsure about them.
We need to provide rotated bounding box conversions. This is tricky again we can resort to 1. overload the Tensor
to Tensor[N, 4]
for non rotated and Tensor[N, 5]
for rotated. Or we can provide rotated_box_convert
.
I think Rotated bounding box is a more generic case of non rotated case. Case where angle is 0 represents normal boxes. (Please correct if I'm wrong).
My thoughts might be naive. I guess this needs better thoughts.
Hi @oke-aditya , the points you highlight and the PR #2710 makes me clear of the implementation of boxes conversion in torchvison
.
Both
torchvision
anddetectron2
represent bounding boxes as(x1, y1, x2, y2)
for Non rotated. We can covert them though, but all the operations are implmented for this format only.
I think that one reason both torchvision
and detectron2
uses the (x1, y1, x2, y2)
format to represent the rectangle bounding box is associated to the utilization of COCO datasets. But for rotated bounding boxes tasks, there are not a standard/benchmark datasets now? I have encountered two tasks using rotated bounding boxes. One is the OCR task - ICDAR 2015, and the other one is aerial image detection. These two uses the polygon format (x1, y1, x2, y2, x3, y3, x4, y4)
(~but they are rotated boxes actually?~ Edited: I'm wrong here, there are actually using the quadrilateral format) So one must converted the polygon format to (x_center, y_center, width, height, angle)
firstly if they want to use detectron2
.
And as @vfdev-5 , @pmeier and @oke-aditya mentioned in #2753 :
OCR is a higher level application and thus should be in a separate package which might depend on
torchvision
.
But if there are operations like nms_rotated
and pairwise_iou_rotated
implemented in torchvision
. It will be convenient for other users.
Let's have more thoughts from @fmassa
Given the rotated blue rectangular region, how to show the label?
If the labels are rotated as well, it will be quite challenging to rotate your head everytime to read the labels. On the other hand, if the label is shown withing the non-rotated bounding box (green), which also contains the target rotated rectangle (blue), then it is clear where the rotated bounding box is and is also easy to read.
So, each bounding box then is composed of two things:
[(x1, y1), (x2, y2), (x3, y3), (x4, y4)]
[x_center, y_center, height, width, angle]
But, any format specification should be adequate to determine both bounding boxes and draw them as well.
Are there plans to port the rotated ops from detectron2 to torchvision?
Any update on this? This would be really beneficial to the remote sensing community since, in RS imagery, objects of various sizes & orientations can be clustered together, and rotated object detectors generally outperform default detectors in those cases. If the rotated ops in detectron2
are ported over, we can write up a Rotated Faster-RCNN
for torchvision
.
@pmeier I guess with the new Features API this may be possible in near future?
Yes, in theory we could have features.RotatedBoundingBox
with shape (*, 5)
where 4 items represent the coordinates and one represents the angle. For now, we already implemented rotate_bounding_box
:
This actually rotates the coordinates and thus the result is a "proper" (for the lack of a better term) bounding box. Thus, the bounding box may change shape.
Excuse my ignorance here, but is rotation invariance that important? What about all the other transformations that change the shape of bounding boxes, like affine, elastic, perspective, and so on? From my, admittedly limited, perspective it seems weird that rotation is singled out here. Or to put differently, having a features.RotatedBoundingBox
in the library would be odd, since it behaves just like features.BoundingBox
except for one transformation.
Rotation invariance is indeed very important when doing object detection on remote sensing imagery. But the addition of a RotatedBoundingBox
is to better represent the objects in the scene not as a form of augmentation.
Some advantages of RotatedBoxes over regular Boxes:
Below is an example from the DOTA paper comparing predictions from Faster-RCNN vs Rotated Faster-RCNN:
These same features are why rotated boxes are used for sign detection as well. From a representation perspective, the default BoundingBox is a special case of the RotatedBoundingBox with angle = 0 degrees.
@pmeier We might need to look into Detectron2 for some Inspiration. As I know operations such as NMS and IoUs etc differ for Rotated boxes.
Maybe I misunderstood the context here: are we talking about rotated bounding boxes at the transforms or at the model stage?
Maybe I misunderstood the context here: are we talking about rotated bounding boxes at the transforms or at the model stage?
At the modelling stage. You would need rotated variants for NMS, IOU calculation, RPNs, ROIAlign etc. Just like in detectron2.
Ok, in that case the prototype features won't really help here as the models will only work with plain tensors. Basically, in the transform stage the features.RotatedBoundingBox
will behave just like the regular features.BoundingBox
except for rotate. Apart from that, all models and ops that are not associated with agumentation, will have to be implemented.
I don't know what the priority for this would be. From what I understand is that our roadmap for H2 this year is already full, but I'll let @datumbox chime in here.
Since I had created the issue, my initial idea was that it isn't exactly to do with transforms but we would need a few small transforms like say horizontal flip etc that need both image and the box.
But other than that mostly it's on the ops side.
This is a bigger piece of work that we should consider after H2. Philip is right to say that our roadmap is currently full and this is quite a sizeable piece of work to squeeze in. As Aditya said, this is not just about adding support for transforms but also for operators. I would be in favour of reviewing the support of this as a whole as we would need to provide Transforms, Ops and Models that are compatible. IMO we should avoid adding more partial support for features in TorchVision; we already have quite a few things that are partially supported (Keypoints, Video Decoding etc). We are currently investing on closing any high priority gaps on the already supported tasks. Once this is done, we should stack rank big proposed features to see which of them are worth investing into. The situation I would like to avoid is having some support of Rotated BBoxes in some Transforms/Ops/Models but not on others. This is likely to make our Detection API more complex and harder to revamp.
🚀 Feature
A bit unsure about this feature.
Support Rotated Bounding Boxes in Torchvision.
Motivation
There is recent research on Rotated Bounding boxes which provides better detection results. I am not able to find highly cited results but a few of them are
I'm not sure for more papers. I think this is slightly new topic, and needs a bit more research.
Pitch
If possible we can also support Rotated models. In my opinion it might not be very feasible. Since it will take a lot of time and maintenance for these models.
Alternatives
Currently, Detectron2 has great support for all the above. These operations are implemented in C++.
Additional context
Looks tricky and challenging. I guess it might also be too early to think about this. I think it needs a bit more research and some baseline.
cc @pmeier