Closed Sairam13001 closed 1 year ago
Hi, thanks for your interest in our work. In my understanding, it is not necessary to divide your custom dataset into DET and VID. It is just a special division of ILSVRC2015 datasets. For training the single frame baseline, you can use all of the custom datasets for training, but there is no learning of temporal knowledge during this stage. Each frame of the video data is regarded as a separate image for training the image detector. For more details on how to re-organize the customs dataset, I'd suggest you refer to the JSON data we provided in this link: https://drive.google.com/drive/folders/1cCXY41IFsLT-P06xlPAGptG7sc-zmGKF?usp=sharing.
Ohh, okay. I get it. Thank you for your quick reply.
I went through the JSON files. Could you let me know what 'Instance ID' means?
Hi, instance ID is not used in our code in fact. Since we are following MMTracking for preparing the JSON data in this link: https://github.com/open-mmlab/mmtracking/blob/master/tools/convert_datasets/ilsvrc/imagenet2coco_vid.py, our JSON files also include this key. In my understanding, 'Instance ID' is used for multi-object tracking because MMTracking is a unified framework for video object detection and multi-object tracking.
Understood. Thank you so much.
Your work is amazing. Thank you for open sourcing the code.
I want to train a model on a custom video dataset that has 40 train videos with around 15000 frames.
In your approach, you are using both ILSVRC2015 DET and VID datasets right. Is there a necessity to have two such datasets.
If no, could you kindly let me know how I can prepare my dataset for training.
Would you suggest I break my video dataset into DET and VID datasets and use DET for training SingleFrameBaseline and VID for training TransVOD Lite?