Closed HamsterHuey closed 2 years ago
Hmm, orignally while adding gIoU Issue #2545 also proposed to add CIoU and DIoU.
fmt
parameter I think would need to be implemented for other ops too for consistency.
I think adding optional fmt
parameter won't cause a breaking change.
E.g. box_area
, clip_boxes_to_image
etc. too don't take this parameter.
For all torchvision ops the format used is XYXY.
To convert into XYXY from YOLO / COCO users can use box_convert
from ops.
@fmassa should ops
make use of fmt
parameter ?
It extends the functionality. Previously before adding ops we didn't have box_convert
so we couldn't.
I'd personally vote for a simpler API and removing fmt
from here. Like you said, box_convert
exists for this reason. It's probably best for all the other methods to work only on XYXY format, otherwise you end up with a bunch of format handling overhead in each new method that operates on boxes.
I support the idea of adding CIoU and DIoU. I agree with @HamsterHuey that a simpler API is better than every function doing also format conversion.
There's also one thing that I don't like with the current box_iou / generalized_box_iou interface: They take two input arguments, boxes1
and boxes2
and return a matrix of output values for every possible pair (box1, box2), where box1 is from the first input argument and box2 is from the second one. What e.g. YOLOv4 needs is just the IoU values of the matching boxes, so one needs to call box_iou(boxes1, boxes2).diagonal(). This is slow (O(N^2) instead of O(N)) and it's not consistent with other PyTorch operations. For example, if you want to calculate the sum of matching elements of two vectors, you call x + y
, and if you want a matrix of all possible combinations, then you call x[:, None] + y[None, :]
. If we want the functions to support producing all possible combinations, I suggest making it optional.
We have implemented this in release 0.13 Hope you find it useful @senarvi @HamsterHuey
cc @datumbox
Ooops, yes thanks for the ping. It's available on v0.13. I'll close the issue.
🚀 Feature
Implement Distance and Complete IoU within torchvision to complement the current supported implementations of IoU and GIoU
Motivation
Distance and Complete IoU have been published about in the literature (Nov, 2019): https://arxiv.org/abs/1911.08287
They have been shown to be very effective when utilized as loss-terms in box-regression for object detectors in comparison to conventional IoU loss or even generalized IoU (GIoU), both of which currently have implementations in torchvision. They are also actively used in training state-of-the-art object detectors (eg: Yolo v4)
Pitch
I already have a working prototype for an implementation. How this would be structured is open-ended. I think it might be better to just have a single Box IoU method that supports multiple modes? This is the call-signature of my current implementation:
Obviously, the
fmt
param above is entirely optional. I've found it convenient to support the different formats, but it is easily stripped out if we want to enforce all boxes be in XYXY format.I've written this up by adapting the current implementation of GIoU and there shouldn't be any extra overhead for any single pathway for the IoU type. Happy to contribute if this is of interest to the community.