votchallenge / toolkit

The official VOT Challenge evaluation and analysis toolkit
http://www.votchallenge.net/
GNU General Public License v3.0
153 stars 43 forks source link

Maybe some bugs in the toolkit #98

Closed davidyang180 closed 1 year ago

davidyang180 commented 1 year ago

Hi! After debugging, I found that there may be some bugs in the toolkit. I don't know if this is a problem with the code I wrote, so I will report the bugs found here.

  1. Dev dataset evaluation process When evaluating my tracker on the dev dataset, I found that the Quality metric has been low, but the visualization of the predicted mask is not too wrong. Because dev dataset provides the ground-truth of the frame, in order to verify the evaluation process, I directly generate the ground-truth mask through the official io.py (it has been visually verified that the generated ground-truth mask matches the image). When evaluating a frame with ground-truth mask provided, directly read the generated ground-truth mask (binary image, the shape is the size of the image) as the report result, and when evaluating a frame without ground-truth mask (the label in groundtruth_*.txt is 0 ), directly reported as None. image

But the Quality metric is also very low. I would like to ask what is the reason for this?

  1. Test dataset evaluation process On the test dataset, when using the vot command to evaluate my tracker, it is found that the shape of the first frame mask received through the handle.objects() command is inconsistent with the shape of the picture. e.g. animal sequence: the shape of handle.objects(): (271, 850) the shape of image: (720, 1280) This problem does not arise because the ground-truth of the dev dataset has no mask offset. However, the test data set has a mask offset in the ground-truth labels of some sequences. Is it the offset of the unprocessed mask in the toolkit?
lukacu commented 1 year ago

It is not clear completely, but it seems that you are trying to compute overlaps manually (at least some steps). What you have to know is that masks are optimized when they are loaded to memory to reduce memory footprint (e.g. the full groundtruth of 144 sequences takes more than 40GB of memory). Since tracking masks are usually limited to a part of an image we encode only the bounding box of the actual object and an offset. This was used in the past challenges and it worked well.

I would say that your problem may be related to not taking offset into account. We have computed scores for several trackers and they make sense. But if you have noticed a problem, then please provide a single-frame example where computation of overlap is wrong.

Also, please do not overuse the dev dataset, it has only been released to test the technical aspect of integration.

davidyang180 commented 1 year ago

It is not clear completely, but it seems that you are trying to compute overlaps manually (at least some steps). What you have to know is that masks are optimized when they are loaded to memory to reduce memory footprint (e.g. the full groundtruth of 144 sequences takes more than 40GB of memory). Since tracking masks are usually limited to a part of an image we encode only the bounding box of the actual object and an offset. This was used in the past challenges and it worked well.

I would say that your problem may be related to not taking offset into account. We have computed scores for several trackers and they make sense. But if you have noticed a problem, then please provide a single-frame example where computation of overlap is wrong.

Also, please do not overuse the dev dataset, it has only been released to test the technical aspect of integration.

Hi! It means that the shape size of the mask obtained by the handle = vot.VOT("mask",multiobject=True); objects = handle.objects()command is not equal to the image shape, do we need to handle the offset by ourselves?

davidyang180 commented 1 year ago

By the way, during the tracking process, is the mask result reported by the tracker a binary image? (e.g. a 2D ndarray of shape (720, 1280) with values 0 or 1)

lukacu commented 1 year ago

Yes, the mask is a 2d image, but is encoded during transmission or storage as RLE with an offset.

davidyang180 commented 1 year ago

Yes, the mask is a 2d image, but is encoded during transmission or storage as RLE with an offset. Hi! When I report the tracker results through the toolkit, you can see from the figure below that the first frame of ground truth mask printed by the trax server is a correct RLE encoding. image But during the tracking process, the tracker reports 2d image mask, but the intermediate tracker result printed by the trax server only has offset and size. Is this normal? image

lukacu commented 1 year ago

The state looks like an empty mask. It has a full dimensions of the input frame, but no foreground pixels that need encoding. The offset is optional and can be used to reduce size of a mask.

But there is a small issue that was perhaps not explained enough: trax protocol does not really know the size of an image so if the mask is truncated, the top-left corner can be recovered due to the offset, but the bottom and right empty parts are not recovered automatically. This means that the matrix returned by python wrapper may not have the same size as the input image, you have to pad it with zeros if you want to have same size matrices.

lukacu commented 1 year ago

Please close the issue if your questions were answered.

davidyang180 commented 1 year ago

The state looks like an empty mask. It has a full dimensions of the input frame, but no foreground pixels that need encoding. The offset is optional and can be used to reduce size of a mask.

But there is a small issue that was perhaps not explained enough: trax protocol does not really know the size of an image so if the mask is truncated, the top-left corner can be recovered due to the offset, but the bottom and right empty parts are not recovered automatically. This means that the matrix returned by python wrapper may not have the same size as the input image, you have to pad it with zeros if you want to have same size matrices.

Hi! I am sure that there is a bug when trax reports the results. Taking the cat-18 sequence of the dev dataset as an example, I visually verified that the mask output of the tracker has foreground pixels (positive(1) pixels) in the 2nd frame and the 626th frame. But after these two frames are reported, the mask of the 2nd frame is not encoded. What is the problem?

Visualization of the mask in 2nd frame:

image

The result of the tracking mask of the 2nd frame after trax encoding:

image

Visualization of the mask in 626th frame:

image

The result of the tracking mask of the 626th frame after trax encoding:

image

lukacu commented 1 year ago

The code that builds the low level communication has been used in several challenges already and although it was extended to multiple objects this year I find it unlikely that there would be a problem with it. We have also tested it with several trackers and as far as I can tell we found no problems. I will check with other people in the organizing team that have tested other trackers.

The most likely issue that I see is that the input mask is somehow different and not correctly recognized in one of the cases.

davidyang180 commented 1 year ago

The code that builds the low level communication has been used in several challenges already and although it was extended to multiple objects this year I find it unlikely that there would be a problem with it. We have also tested it with several trackers and as far as I can tell we found no problems. I will check with other people in the organizing team that have tested other trackers.

The most likely issue that I see is that the input mask is somehow different and not correctly recognized in one of the cases.

Hi! The code that has just been debugged has been able to encode the corresponding mask. But I just added a few extra lines of code to make the mask regenerate the same mask. Although the two masks are exactly the same from the top-level code, it is successful to regenerate the same mask. I think this may involve low-level coding issues, I have not yet been able to find this bug.

lukacu commented 1 year ago

Ok, so I am not exactly sure from the response what was the issue and if we can improve the instructions somehow. If there are any hints, please let us know.