pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.06k stars 6.93k forks source link

[feature request] Support batches (arbitrary number of batch dims) for box_iou / generalized_box_iou / box_area / box_convert / clip_boxes_to_image #3478

Open vadimkantorov opened 3 years ago

vadimkantorov commented 3 years ago

Currently it supports only (N4, M4) -> NM. I propose to also support (BN4, BM4) -> BNM. It would also be nice if they supported arbitrary number of batch dimensions.

This is useful for computing a cost matrix between predicted boxes and ground truth boxes for a batch of frames. Probably it can be done by adjusting tensor indexing. Something like that:

def _box_inter_union(boxes1: Tensor, boxes2: Tensor) -> Tuple[Tensor, Tensor]:
    area1 = box_area(boxes1)
    area2 = box_area(boxes2)

    lt = torch.max(boxes1[..., None, :2], boxes2[..., :2, None])
    rb = torch.min(boxes1[..., None, 2:], boxes2[..., 2:, None])

    wh = _upcast(rb - lt).clamp(min=0) 
    inter = wh[..., :, 0] * wh[..., :, 1]

    union = area1[..., None] + area2 - inter

    return inter, union
oke-aditya commented 3 years ago

Hi. Sorry for the delay.

I think this can be done with torch.vmap. Have a look here

Have a look at pytorch/issue/42368

batched_iou = torch.vmap(box_iou) 

And now we can use boxes with batches

iou = batched_iou(batched_boxes1, batched_boxes2)

I don't have much experience with vmap but I think this would be extensible. Can you give a run @vadimkantorov ?

vadimkantorov commented 3 years ago

This already exists in detectron2: https://github.com/facebookresearch/detectron2/blob/cbbc1ce26473cb2a5cc8f58e8ada9ae14cb41052/detectron2/structures/boxes.py#L346

But given how frequent this is, I propose this (and maybe some related box ops) are included in core torchvision