pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
15.99k stars 6.92k forks source link

NMS discards box when IoU == iou_threshold #7165

Open junpeiz opened 1 year ago

junpeiz commented 1 year ago

🐛 Describe the bug

According to the doc of NMS [1], it discards all overlapping boxes with IoU > iou_threshold, so the box with IoU == iou_threshold should be kept. However, according to the following code snippet, it's not the case.

input_boxes = torch.tensor([[1., 1., 2., 3.], [0., 0., 2. , 2.]])
input_scores = torch.tensor([3., 2.])
# Two boxes should both be kept, but only one get kept.
torchvision.ops.nms(input_boxes, input_scores, iou_threshold=0.2)
# Verify the IoU is same as iou_threshold.
torchvision.ops.box_iou(torch.tensor([[1., 1., 2., 3.]]), torch.tensor([[0., 0., 2., 2.]]))

[1] https://pytorch.org/vision/main/generated/torchvision.ops.nms.html

Versions

Versions of relevant libraries: [pip3] numpy==1.21.0 [pip3] torch==1.13.1 [pip3] torchvision==0.14.1 [conda] numpy 1.21.0 pypi_0 pypi [conda] torch 1.13.1 pypi_0 pypi [conda] torchvision 0.14.1 pypi_0 pypi

NicolasHug commented 1 year ago

I think this has to do with float precision. The computed IoU on float32 is actually slightly bigger than the 0.2 threshold.

torch.set_printoptions(precision=10)
torchvision.ops.box_iou(torch.tensor([[1., 1., 2., 3.]]), torch.tensor([[0., 0., 2., 2.]]))
# tensor([[0.2000000030]])

which is not exactly equal to 0.2.

Convert everything explicitly to float64 and you'll see that both boxes are kept.

junpeiz commented 1 year ago

Thank you for the info! In my project, I cannot use fp64 due to some restrictions. I tested both fp32 and fp64, and confirmed that this issue only happens in fp32.

# Default is fp32
input_boxes = torch.tensor([[1., 1., 2., 3.], [0., 0., 2. , 2.]])
input_scores = torch.tensor([3., 2.])

# Explicitly use fp64
input_boxes_fp64 = torch.tensor([[1., 1., 2., 3.], [0., 0., 2., 2.]], dtype=torch.float64)
input_scores_fp64 = torch.tensor([3., 2.], dtype=torch.float64)

# Output is tensor([0])
torchvision.ops.nms(input_boxes, input_scores, iou_threshold=0.2)

# Output is tensor([0, 1])
torchvision.ops.nms(input_boxes_fp64, input_scores_fp64, iou_threshold=0.2)

However, that The computed IoU on float32 is actually slightly bigger than the 0.2 threshold doesn't hold on my end, which can be verified in the following code snippet.

# Output is tensor([[True]]).
torchvision.ops.box_iou(torch.tensor([[1., 1., 2., 3.]]), torch.tensor([[0., 0., 2., 2.]])) == 0.2

Also, let's assume the score is 0.2000000030 as you provided. According to the NMS doc, discarding a box is determined by iou larger than threshold, which corresponds to the following code snippet. Then the box should not be discarded because the > check will return False, right? That's why I still think it's a bug in PyTorch for float32.

# Output is tensor([[False]])
torchvision.ops.box_iou(torch.tensor([[1., 1., 2., 3.]]), torch.tensor([[0., 0., 2., 2.]])) > 0.2000000030
YifanShenSZ commented 1 year ago

I think the reason is: the type of iou_threshold is fixed to double, rather than following the input tensor dtype, so the lower-precision input would suffer.

That line of code is here. Changing it to scalar_t iou_threshold should be able to fix this issue.