got-10k / toolkit

Official Python toolkit for generic object tracking benchmark GOT-10k and beyond
http://got-10k.aitestunion.com/
MIT License
557 stars 95 forks source link

suspicious + 1 in VOT AABBox calculation #16

Open jonathantompson opened 5 years ago

jonathantompson commented 5 years ago

https://github.com/got-10k/toolkit/blob/b2428f6e378c311ab4ccea3b93d0da5b54c74efc/got10k/datasets/vot.py#L222

    def _corner2rect(self, corners, center=False):
        cx = np.mean(corners[:, 0::2], axis=1)
        cy = np.mean(corners[:, 1::2], axis=1)

        x1 = np.min(corners[:, 0::2], axis=1)
        x2 = np.max(corners[:, 0::2], axis=1)
        y1 = np.min(corners[:, 1::2], axis=1)
        y2 = np.max(corners[:, 1::2], axis=1)

        area1 = np.linalg.norm(corners[:, 0:2] - corners[:, 2:4], axis=1) * \
            np.linalg.norm(corners[:, 2:4] - corners[:, 4:6], axis=1)
        area2 = (x2 - x1) * (y2 - y1)
        scale = np.sqrt(area1 / area2)
        w = scale * (x2 - x1) + 1  # <-- This doesn't look right.
        h = scale * (y2 - y1) + 1

This actually mimics an old version of VOT that had did not have REGION_LEGACY_RASTERIZATION. In any case, if corners is already axis aligned then in the above code scale = 1.0 and then width and height have an extra pixel. It's a small error, but I think this is a bug.

Additional question 1: Why are you selecting a bounding box with the same area as the oriented bounding box? Does this maximize iou in some meaningful / principled way? Or is this just a heuristic?

Additional question 2: VOT + TRAX does not do this conversion if a Rectangular region is requested. VOT does a vanilla min and max over the input vertices. If you want to be consistent with the VOT toolkit maybe the above implementation should be opt-in to avoid confusion?

huanglianghua commented 5 years ago

Hi, the _corner2rect is just used to facilitate tracking, since most trackers are based on rectangles instead of corners (you can set anno_type='rect'/'default' to switch between rectangles and corners). It is NEVER used in evaluation as well as failure recovery of VOT experiments.

We use poly_iou for doing these (See line 140, 263, 282, 566 of experiments/vot.py). The function directly takes corners as input and calculate IoU between polygons, which should be consistent with VOT, though we still need to further check their results.

Additional question 1: Why are you selecting a bounding box with the same area as the oriented bounding box? Does this maximize iou in some meaningful / principled way? Or is this just a heuristic?

The same principle of _corner2rect had been used in many trackers, such as SRDCF, ECO, etc. It may based on the hypothesis that the area of an object should not change, no matter how it rotates in the plane.

huanglianghua commented 5 years ago

For the line h = scale * (y2 - y1) + 1.

For example, when y2=1, y1=0, the h should be 2 (2 pixels in total) instead of 1. Here we assume the corners are inside the object instead of outside.

jonathantompson commented 5 years ago

Thanks so much for the fast reply! I really do appreciate the clarifying answers.

To address your first point:

"It is NEVER used in evaluation as well as failure recovery of VOT experiments."

I respectfully disagree. It's not used to calculate metrics, but it's the first frame GT box provided to the tracker. That makes it part of the evaluation framework. IMO you'd want that to match (using default settings) what you get from trax. The bottom line is that if you ran a tracker through your framework, collect some metrics, and then run the same tracker through the official VOT framework, the very first frame's predictions wont match (since the starting BBox doesn't match), which will lead the metrics to not match. Admittedly the difference will be tiny, but it might cause some confusion / headaches.

Otherwise, you've done an AMAZING job of matching their eval. I'm super impressed. I completely understand that the work done in this repo is not trivial. Thanks so much for sharing it with the community!

To clarify: I understand your general sentiment. If the user gets the polygon from trax it's up to them to decide how to convert that to AABBoxes. Fair point. However, what is implemented for AABBoxes in this repo does not match what is implemented by VOT + TRAX.

Lets not take my word for it, lets look at an example:

Using the official VOT toolkit python interface defined here:

https://github.com/votchallenge/vot-toolkit/blob/07081d6cd02edf209702d90681feaa20138d5d57/tracker/examples/python/vot.py#L82

and calling the constructor with region_format='rectangle'.

Then running workspace/run_test.m and selecting ants1. trax gives a first frame region of:

[128.72, 458.36, 28.11, 71.05]

(for xmin, ymin, width and height). The corresponding first line of ants1/groundtruth.txt:

137.21,458.36,156.83,460.78,148.35,529.41,128.72,526.99

which implies:

xmin = 128.72  <-- SAME AS TRAX'S xmin
xmax = 156.83  
ymin = 458.36  < -- SAME AS TRAX'S ymin
ymax = 529.41

ymax  - ymin = 71.05 <-- SAME AS TRAX'S HEIGHT
xmax - xmin = 28.11 <-- SAME AS TRAX'S WIDTH

So two things jump out.

  1. VOT + TRAX does not use constant area boxes when converting from "polygons" to "rectangles". It does a simple AABBox bounding volume.

  2. Your definition of width and height being a count in the number of pixels does not match VOT + TRAX's definition. They do not include a + 1 term.

I think that VOT + TRAX's definition of width, height is consistent with what I've seen in CV over the years. Pixels are typically defined as center sampled, so a pixel at index 0 is sampled at location 0.5, and index 1 is sampled at location 1.5. So if you have a bounding volume from index 0 to 1, the width of that volume is the width of the samped region = 1. I think the +1 value is a classic fence post error, although in this case I totally understand how you got there and you're free to define it however you want :-) But the reality is that it's clearly not the definition that TRAX uses and this is the point I was making with my original post.

Hope that clarifies things.

huanglianghua commented 5 years ago

Hi, thank you for the detailed explanation. I think you are right. The _corner2rect in current implementation is adapted from Martin's MATLAB code instead of VOT toolkit + trax. It would be better if an option that is consistent with trax is by default provided. I'll do the implementation later to make sure the errors be as small as possible.