yangxue0827 / STAR-MMRotate

Oriented object detection on STAR dataset.
https://linlin-dev.github.io/project/STAR.html
Apache License 2.0
47 stars 2 forks source link

h2rbox v2 test results? #3

Closed xavibou closed 2 months ago

xavibou commented 3 months ago

Hello,

Thanks for the code, I am able to train a H2RBoxv2 model on DOTAv1 and I obtain the following results by the end of the training::

Screenshot 2024-06-26 at 16 51 35

I assume these correspond to the validation scores at the end of the training, since the test annotations are not provided by the authors of DOTA. Hence, do the tables reported in the paper correspond to the results on the test set? Do you extract them by uploading the predictions on the evaluation server?

Thanks

yangxue0827 commented 3 months ago

yes, you are right.

yangxue0827 commented 3 months ago

I fix a bug in h2rbox-v2 config:

https://github.com/yangxue0827/RSG-MMRotate/blob/db7d2a4f97737399a0b9f72a0af85b351704dced/configs/h2rbox_v2p/h2rbox_v2p_r50_fpn_1x_dota_le90.py#L82-L95

The previous version still used rbox supervised learning.

xavibou commented 3 months ago

Hi, I applied the changes on the h2rbox-v2 config file and I got similar results as before. In fact, I am quite confused by the nested_projection method in the H2RBoxV2PHead class, which is used to compute the target and predicted horizontal boxes from their respective oriented ones as weak supervision.

def nested_projection(self, pred, target):
        target_xy1 = target[..., 0:2] - target[..., 2:4] / 2
        target_xy2 = target[..., 0:2] + target[..., 2:4] / 2
        target_projected = torch.cat((target_xy1, target_xy2), -1)
        pred_xy = pred[..., 0:2]
        pred_wh = pred[..., 2:4]
        da = pred[..., 4] - target[..., 4]
        cosa = torch.cos(da).abs()
        sina = torch.sin(da).abs()
        pred_wh = torch.matmul(
            torch.stack((cosa, sina, sina, cosa), -1).view(*cosa.shape, 2, 2),
            pred_wh[..., None])[..., 0]
        pred_xy1 = pred_xy - pred_wh / 2
        pred_xy2 = pred_xy + pred_wh / 2
        pred_projected = torch.cat((pred_xy1, pred_xy2), -1)
        return pred_projected, target_projected

        return pred_projected, target_projected

Here, the target angle (i.e. the angle ground truth information) is being used to extract the pred_projected tensor, as the angle difference da = pred[..., 4] - target[..., 4] seems to be used to refine the prediction's height and width pred_wh. Shouldn't we refrain from using the ground truth angle since we are supposed to not have that information?

yangxue0827 commented 3 months ago

@yuyi1005

yuyi1005 commented 3 months ago

The ground truth angle is usually 0 or pi/2 for horizontal boxes, but when the random rotation augmentation is applied, these target boxes could have an angle. "da" is the angle difference between the internal rotated box and the external circumscribed box (either rotated or horizontal). See Figure 4(b) in the paper, where the ground truth box is also a rotated box due to random rotation augmentation.

The result you provided (82.8% on trainval set) seems correct. You can also run tools/test.py and upload the zip file to the DOTA server to obtain the results on test set.

xavibou commented 3 months ago

I see, so if I understood correctly: after applying random rotation the originally horizontal bounding box becomes oriented, and the code extracts the prediction's circumscribed bounding box oriented by the same angle.

For the results, I would love to test them on the server but I can't get the confirmation email after sign up.

It is more clear now, thanks!

yangxue0827 commented 3 months ago

Gmail email addresses may not be able to be registered as they require the email suffix of the organization, e.g. sjtu.edu.cn, pjlab.org.cn