pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16k stars 6.92k forks source link

[RFC] Interest in box-IoU implementation that supports distance-IoU (DIoU) and complete-IoU (CIoU)? #3026

Closed HamsterHuey closed 2 years ago

HamsterHuey commented 3 years ago

🚀 Feature

Implement Distance and Complete IoU within torchvision to complement the current supported implementations of IoU and GIoU

Motivation

Distance and Complete IoU have been published about in the literature (Nov, 2019): https://arxiv.org/abs/1911.08287

They have been shown to be very effective when utilized as loss-terms in box-regression for object detectors in comparison to conventional IoU loss or even generalized IoU (GIoU), both of which currently have implementations in torchvision. They are also actively used in training state-of-the-art object detectors (eg: Yolo v4)

Pitch

I already have a working prototype for an implementation. How this would be structured is open-ended. I think it might be better to just have a single Box IoU method that supports multiple modes? This is the call-signature of my current implementation:

def box_iou(
        boxes1: torch.Tensor,
        boxes2: torch.Tensor,
        fmt: str = 'xcycwh',
        iou_type: str = 'ciou',
) -> torch.Tensor:
    """
    A general-purpose intersection-over-union (Jaccard index) estimation method
    for boxes that supports standard IoU, Generalized IoU, Distance IoU and
    Complete IoU.

    Arguments:
        boxes1 (Tensor[N, 4]): [N, 4] Tensor of N boxes
        boxes2 (Tensor[M, 4]): [M, 4] Tensor of M boxes
        fmt (str, optional): Specifies format of input boxes. One of {'xywh',
            'xyxy', 'xcycwh'}; Default is **'xcycwh'**
        iou_type (str, optional): One of {'iou', 'giou', 'ciou', diou'};
            Default is **'ciou'**

    Returns:
        iou (Tensor[N, M]): the NxM matrix containing the pairwise
            IoU values for every element in boxes1 and boxes2. The type of IoU
            metric is determined by the input ``iou_type`` param.
    """

Obviously, the fmt param above is entirely optional. I've found it convenient to support the different formats, but it is easily stripped out if we want to enforce all boxes be in XYXY format.

I've written this up by adapting the current implementation of GIoU and there shouldn't be any extra overhead for any single pathway for the IoU type. Happy to contribute if this is of interest to the community.

oke-aditya commented 3 years ago

Hmm, orignally while adding gIoU Issue #2545 also proposed to add CIoU and DIoU.

fmt parameter I think would need to be implemented for other ops too for consistency. I think adding optional fmt parameter won't cause a breaking change.

E.g. box_area, clip_boxes_to_image etc. too don't take this parameter.

For all torchvision ops the format used is XYXY. To convert into XYXY from YOLO / COCO users can use box_convert from ops.

@fmassa should ops make use of fmt parameter ? It extends the functionality. Previously before adding ops we didn't have box_convert so we couldn't.

HamsterHuey commented 3 years ago

I'd personally vote for a simpler API and removing fmt from here. Like you said, box_convert exists for this reason. It's probably best for all the other methods to work only on XYXY format, otherwise you end up with a bunch of format handling overhead in each new method that operates on boxes.

senarvi commented 2 years ago

I support the idea of adding CIoU and DIoU. I agree with @HamsterHuey that a simpler API is better than every function doing also format conversion.

There's also one thing that I don't like with the current box_iou / generalized_box_iou interface: They take two input arguments, boxes1 and boxes2 and return a matrix of output values for every possible pair (box1, box2), where box1 is from the first input argument and box2 is from the second one. What e.g. YOLOv4 needs is just the IoU values of the matching boxes, so one needs to call box_iou(boxes1, boxes2).diagonal(). This is slow (O(N^2) instead of O(N)) and it's not consistent with other PyTorch operations. For example, if you want to calculate the sum of matching elements of two vectors, you call x + y, and if you want a matrix of all possible combinations, then you call x[:, None] + y[None, :]. If we want the functions to support producing all possible combinations, I suggest making it optional.

oke-aditya commented 2 years ago

We have implemented this in release 0.13 Hope you find it useful @senarvi @HamsterHuey

cc @datumbox

datumbox commented 2 years ago

Ooops, yes thanks for the ping. It's available on v0.13. I'll close the issue.