rwightman / efficientdet-pytorch

A PyTorch impl of EfficientDet faithful to the original Google impl w/ ported weights
Apache License 2.0
1.58k stars 293 forks source link

[BUG] BBoxes not clipped or removed in RandomResizePad and ResizePad #185

Closed mkmenta closed 3 years ago

mkmenta commented 3 years ago

First of all, thanks for your hard and great work!

Describe the bug I think that the bounding boxes are not clipped or removed correctly if their coordinates come out from the right and bottom edges in the data transformations of RandomResizePad and ResizePad.

To Reproduce

import random

import numpy as np
from PIL import Image

from effdet.data import RandomResizePad

img = Image.new("RGB", (1000, 1500), color=(0, 0, 0))
target = {
    "bbox": np.array([[649, 349, 1400, 703],
                      [1400, 270, 1480, 434]], dtype=np.float64),
    "cls": np.array([1, 2])
}

random.seed(0)
rrp = RandomResizePad(target_size=512, scale=(1.8, 1.8))
img_t, target_t = rrp(img, target)

print(target_t['bbox'])

the code above outputs:

[[ 88.7456 172.4256 550.16   389.9232]
 [550.16   123.888  599.312  224.6496]]

which, if I'm not wrong, shouldn't be correct: we have set the target_size=512 and the first BBox y2 coordinate is 550.16. The same happens with the second BBox that has its coordinates out of the 512x512 image.

Expected behavior The output of that code I think that should be:

[[ 88.7456 172.4256 512.     389.9232]]

(first bbox clipped and second bbox removed).

Screenshots A visualization of what I'm saying. Before: before After: after

Desktop:

Suggested fix Changing the lines 100 and 162 of transforms.py from:

clip_boxes_(bbox, (scaled_h, scaled_w))

to

clip_boxes_(bbox, self.target_size)

the code I wrote to reproduce the issue outputs:

[[ 88.7456 172.4256 512.     389.9232]]

Visualization of the transformed image: after_new

Thank you in advance!

rwightman commented 3 years ago

@mkmenta thanks, looks like a potential issue, I'll dig in more over next few days

rwightman commented 3 years ago

Thinking it should be as per below, to ensure clipping to either target image bounds or letterboxing, whichever is smaller.

clip_boxes_(bbox, (min(scaled_h, self.target_size[0]), min(scaled_w, self.target_size[1])))

mkmenta commented 3 years ago

That's true! Sorry, I missed that.

rwightman commented 3 years ago

@mkmenta I'm testing #186 in training