nwojke / deep_sort

Simple Online Realtime Tracking with a Deep Association Metric
GNU General Public License v3.0
5.16k stars 1.45k forks source link

Fix bbox bottom right corner calculation #314

Open st235 opened 1 year ago

st235 commented 1 year ago

The boxes are given in (top left x, top left y, width, height) format. In order to get bottom right corner we need to add width and height to x and y correspondingly and subtract 1.

On the line 62 in tools/generate_detections.py you treat image rect correctly by subtracting 1 from the image box, which is originally (0, 0, w, h) and after subtracting (0, 0, w - 1, h - 1).

bbox[2:] = np.minimum(np.asarray(image.shape[:2][::-1]) - 1, bbox[2:])

however, you don't do it in other places across the repo, for example, on line 57 in the same file

# convert to top left, bottom right
bbox[2:] += bbox[:2]

By not doing so, you get a pixel that is diagonally to the bottom right corner. This minor inconsistency is really confusing.

There are 2 ways of how to resolve it:

  1. Fix rect calculation across the codebase (subtract 1 from other rects when converted to (left, top, right, bottom))
  2. Make image rect consistent to other rects and do not subtract 1 for the image right bottom corner

I believe the first approach is more reasonable to take as it also fixes how iou is calculated. Below is an example of how one can figure out the right bottom corner of a rect:

Screenshot 2023-06-30 at 11 24 44

This is not a critical issue and should not dramatically affect metrics though would be really nice to fix.

st235 commented 1 year ago

@nwojke can you, please, take a look?

st235 commented 12 months ago

@nwojke any luck you can take a look?

st235 commented 11 months ago

@nwojke can you, please, take a look?

SajjadPSavoji commented 7 months ago

As far as I know W = x2 - x1 and H = y2 -y1.

According to that x2(right) = x1(Let) + W not "x2(right) = x1(Let) + W -1".

Maybe you are changing your bboxes in other parts of your implementation.

st235 commented 7 months ago

Hello @SajjadPSavoji

Thank you for the provided feedback. I really appreciate it.

As far as I know W = x2 - x1 and H = y2 -y1. According to that x2(right) = x1(Let) + W not "x2(right) = x1(Let) + W -1". Maybe you are changing your bboxes in other parts of your implementation.

Check out the places I was highlighting in the original comment. I agree, the problem seems a bit tricky.

There are a few inconsistencies around this place.

The original comment on the line 32 says

    bbox : array_like
        The bounding box in format (x, y, width, height).

I.e. if you have a one pixel box, let's say (0, 0, 1, 1), then it seems it is occupying the only pixel at the position 0,0.

Both ways of handling this box would be correct: either consider the right and bottom corners of it in inclusive manner or exclusive one.

Though, the lines 57 and 62 seem inconsistent to each other.

The viewport on line 62 is considered originally as (0, 0, w, h). Right?

    # clip at image boundaries
    bbox[:2] = np.maximum(0, bbox[:2])
    bbox[2:] = np.minimum(np.asarray(image.shape[:2][::-1]) - 1, bbox[2:])

And the box is clipped by (0, 0, w - 1, h - 1).

At line 57 we don't do the same thing.

# convert to top left, bottom right
bbox[2:] += bbox[:2]
bbox = bbox.astype(np.int)

Imagine we have a bounding box (0, 0, viewport_width, viewport_height). Consider those are the width and height of the bounding box (from the pydoc) we trim them at line 62, and the box becomes (0, 0, viewport_width - 1, viewport_height - 1). At the same time, there is nothing wrong with the box having the same width and height as the viewport as this box will be the box with the size of the whole image and it seems it should not be clipped in that case.

As I said before, the inconsistency seems rather minor. However, to fix the issue, we can do a different change, for example, instead of changing boxes handling we can change the viewport handling in here.

Something similar to this

    bbox[2:] = np.minimum(np.asarray(image.shape[:2][::-1]), bbox[2:])

Hopefully, the explanation seems reasonable. What do you think?

P.S.: regardless of the way to fix the issue, I think without receiving the initial maintainers feedback there is nothing we can do about it.