Why do -1 while calculating width and height in imagenet vid dataset?

got-10k / toolkit

Official Python toolkit for generic object tracking benchmark GOT-10k and beyond

http://got-10k.aitestunion.com/

MIT License

559 stars 95 forks source link

Why do -1 while calculating width and height in imagenet vid dataset? #23

Closed amoudgl closed 5 years ago

amoudgl commented 5 years ago

https://github.com/got-10k/toolkit/blob/83aafbf4345d1640eac33a39d98b0fa103a92c6d/got10k/datasets/vid.py#L126-L137

In line 137 above, why did you subtract -1 to calculate width and height?

Shouldn't it be just anno[:, 2:] -= anno[:, :2] which is usually the standard practice while dealing with Pascal VOC format [xmin, ymin, xmax, ymax] or anno[:, 2:] -= anno[:, :2] + 1 if you wish to count the number of pixels in the bounding box?

Please let me know if I am wrong somewhere. :)

huanglianghua commented 5 years ago

The xmin and xmax are assumed to be pixel coordinates within the object. So for the object, width = xmax - xmin + 1 holds. Same for height = ymax - ymin + 1.

annos[:, 2:] -= annos[:, :2] - 1 is the same as annos[:, 2:] = annos[:, 2:] - annos[:, :2] + 1, which follows the same principle as described above.

amoudgl commented 5 years ago

This was a silly one...got it. Thanks!