Hi! as can be seen from the figure below, the shape information provided by the VOTS2023-dev dataset starting with 'm' can finally decode the region.mask of the corresponding frame. However, the ground-truth marked with 0 or 1 are decoded in the region.Special_object format, and it seems that there is no specific mask information. What is the difference between this? Does the corresponding frame have mask information? How to use it when evaluating?
Hi! as can be seen from the figure below, the shape information provided by the VOTS2023-dev dataset starting with 'm' can finally decode the
region.mask
of the corresponding frame. However, the ground-truth marked with 0 or 1 are decoded in theregion.Special_object
format, and it seems that there is no specific mask information. What is the difference between this? Does the corresponding frame have mask information? How to use it when evaluating?