Hello! There is some difference between a definition of YouTubeVIS2021 dataset from Codalab (https://competitions.codalab.org/competitions/28988#participate-get_data) and annotation files from its links to download. Where is a block annotation{
"id" : int,
"video_id" : int,
"category_id" : int,
"segmentations" : [RLE or [polygon] or None],
"areas" : [float or None],
"bboxes" : [[x,y,width,height] or None],
"iscrowd" : 0 or 1,
} in these json files?
How will a model be trained on this data without any information about masks, boxes ant etc? Сan you advise something how to train the model with my own classes and masks?
Hello! There is some difference between a definition of YouTubeVIS2021 dataset from Codalab (https://competitions.codalab.org/competitions/28988#participate-get_data) and annotation files from its links to download. Where is a block annotation{ "id" : int, "video_id" : int, "category_id" : int, "segmentations" : [RLE or [polygon] or None], "areas" : [float or None], "bboxes" : [[x,y,width,height] or None], "iscrowd" : 0 or 1, } in these json files? How will a model be trained on this data without any information about masks, boxes ant etc? Сan you advise something how to train the model with my own classes and masks?