Annotations and masks in YouTubeVIS2021 dataset

Hello! There is some difference between a definition of YouTubeVIS2021 dataset from Codalab (https://competitions.codalab.org/competitions/28988#participate-get_data) and annotation files from its links to download. Where is a block annotation{ "id" : int, "video_id" : int, "category_id" : int, "segmentations" : [RLE or [polygon] or None], "areas" : [float or None], "bboxes" : [[x,y,width,height] or None], "iscrowd" : 0 or 1, } in these json files? How will a model be trained on this data without any information about masks, boxes ant etc? Сan you advise something how to train the model with my own classes and masks?

youtubevos / MaskTrackRCNN

Annotations and masks in YouTubeVIS2021 dataset #59