Closed davidyang180 closed 1 year ago
It is not clear completely, but it seems that you are trying to compute overlaps manually (at least some steps). What you have to know is that masks are optimized when they are loaded to memory to reduce memory footprint (e.g. the full groundtruth of 144 sequences takes more than 40GB of memory). Since tracking masks are usually limited to a part of an image we encode only the bounding box of the actual object and an offset. This was used in the past challenges and it worked well.
I would say that your problem may be related to not taking offset into account. We have computed scores for several trackers and they make sense. But if you have noticed a problem, then please provide a single-frame example where computation of overlap is wrong.
Also, please do not overuse the dev dataset, it has only been released to test the technical aspect of integration.
It is not clear completely, but it seems that you are trying to compute overlaps manually (at least some steps). What you have to know is that masks are optimized when they are loaded to memory to reduce memory footprint (e.g. the full groundtruth of 144 sequences takes more than 40GB of memory). Since tracking masks are usually limited to a part of an image we encode only the bounding box of the actual object and an offset. This was used in the past challenges and it worked well.
I would say that your problem may be related to not taking offset into account. We have computed scores for several trackers and they make sense. But if you have noticed a problem, then please provide a single-frame example where computation of overlap is wrong.
Also, please do not overuse the dev dataset, it has only been released to test the technical aspect of integration.
Hi! It means that the shape size of the mask obtained by the handle = vot.VOT("mask",multiobject=True); objects = handle.objects()command
is not equal to the image shape, do we need to handle the offset by ourselves?
By the way, during the tracking process, is the mask result reported by the tracker a binary image? (e.g. a 2D ndarray of shape (720, 1280) with values 0 or 1)
Yes, the mask is a 2d image, but is encoded during transmission or storage as RLE with an offset.
Yes, the mask is a 2d image, but is encoded during transmission or storage as RLE with an offset. Hi! When I report the tracker results through the toolkit, you can see from the figure below that the first frame of ground truth mask printed by the trax server is a correct RLE encoding.
But during the tracking process, the tracker reports 2d image mask, but the intermediate tracker result printed by the trax server only has offset and size. Is this normal?
The state looks like an empty mask. It has a full dimensions of the input frame, but no foreground pixels that need encoding. The offset is optional and can be used to reduce size of a mask.
But there is a small issue that was perhaps not explained enough: trax protocol does not really know the size of an image so if the mask is truncated, the top-left corner can be recovered due to the offset, but the bottom and right empty parts are not recovered automatically. This means that the matrix returned by python wrapper may not have the same size as the input image, you have to pad it with zeros if you want to have same size matrices.
Please close the issue if your questions were answered.
The state looks like an empty mask. It has a full dimensions of the input frame, but no foreground pixels that need encoding. The offset is optional and can be used to reduce size of a mask.
But there is a small issue that was perhaps not explained enough: trax protocol does not really know the size of an image so if the mask is truncated, the top-left corner can be recovered due to the offset, but the bottom and right empty parts are not recovered automatically. This means that the matrix returned by python wrapper may not have the same size as the input image, you have to pad it with zeros if you want to have same size matrices.
Hi! I am sure that there is a bug when trax reports the results. Taking the cat-18 sequence of the dev dataset as an example, I visually verified that the mask output of the tracker has foreground pixels (positive(1) pixels) in the 2nd frame and the 626th frame. But after these two frames are reported, the mask of the 2nd frame is not encoded. What is the problem?
Visualization of the mask in 2nd frame:
The result of the tracking mask of the 2nd frame after trax encoding:
Visualization of the mask in 626th frame:
The result of the tracking mask of the 626th frame after trax encoding:
The code that builds the low level communication has been used in several challenges already and although it was extended to multiple objects this year I find it unlikely that there would be a problem with it. We have also tested it with several trackers and as far as I can tell we found no problems. I will check with other people in the organizing team that have tested other trackers.
The most likely issue that I see is that the input mask is somehow different and not correctly recognized in one of the cases.
The code that builds the low level communication has been used in several challenges already and although it was extended to multiple objects this year I find it unlikely that there would be a problem with it. We have also tested it with several trackers and as far as I can tell we found no problems. I will check with other people in the organizing team that have tested other trackers.
The most likely issue that I see is that the input mask is somehow different and not correctly recognized in one of the cases.
Hi! The code that has just been debugged has been able to encode the corresponding mask. But I just added a few extra lines of code to make the mask regenerate the same mask. Although the two masks are exactly the same from the top-level code, it is successful to regenerate the same mask. I think this may involve low-level coding issues, I have not yet been able to find this bug.
Ok, so I am not exactly sure from the response what was the issue and if we can improve the instructions somehow. If there are any hints, please let us know.
Hi! After debugging, I found that there may be some bugs in the toolkit. I don't know if this is a problem with the code I wrote, so I will report the bugs found here.
None
.But the Quality metric is also very low. I would like to ask what is the reason for this?
handle.objects()
command is inconsistent with the shape of the picture. e.g. animal sequence: the shape ofhandle.objects():
(271, 850) the shape of image: (720, 1280) This problem does not arise because the ground-truth of the dev dataset has no mask offset. However, the test data set has a mask offset in the ground-truth labels of some sequences. Is it the offset of the unprocessed mask in the toolkit?