zhang-tao-whu / DVIS

DVIS: Decoupled Video Instance Segmentation Framework
MIT License
124 stars 7 forks source link

How to make a dataset for video instance segmentation model? #30

Open SuppurNewer opened 6 months ago

SuppurNewer commented 6 months ago

Hi! The DVIS model is a great model for video tasks. I've finished labeling the video data. Each object of each frame of each video has segmentation, classification, and persistent ID information, and a JSON file is made like

{info:{}, licenses:[], videos:[], # video information categories:[], annotations:[] # Information about each instance, instance represents a collection of objects with unique IDs in each frame of the video }

![Uploading 20240308102623.png…]()

l have a question. What format and information does the 'image_instance‘ of the image contain? In my case, what do I need to do?

Thanks!

SuppurNewer commented 6 months ago

DATASETS: DATASET_NEED_MAP: [True, False, ] DATASET_TYPE: ['image_instance', 'video_instance', ] DATASET_TYPE_TEST: ['video_instance', ] DATASET_RATIO: [1.0, 0.75] TRAIN: ("coco2ytvis2019_train", "mydata_train") TEST: ("mydata_val",)

zhang-tao-whu commented 6 months ago

Hello, thanks for your attention. Please refer to here for the annotation format of YTVIS. The format of instance annotation is basically the same as COCO, but the difference is that bboxes and segmentations are lists of length T, which include the annotations of the object in all frames, with None used as a placeholder for frames where the object does not appear.

zhang-tao-whu commented 6 months ago

If you want to train on the COCO dataset and your own dataset jointly, and if the categories in your dataset are inconsistent with those in YTVIS19, you will need to implement a category mapping to convert the COCO annotations to your dataset. Please refer to here