qianyuzqy / TransVOD_Lite

(TPAMI 2023) TransVOD:End-to-End Video Object Detection with Spatial-Temporal Transformers (implementations of TransVOD Lite).
Apache License 2.0
37 stars 6 forks source link

Training on a custom dataset #2

Closed Sairam13001 closed 1 year ago

Sairam13001 commented 1 year ago

Your work is amazing. Thank you for open sourcing the code.

I want to train a model on a custom video dataset that has 40 train videos with around 15000 frames.

In your approach, you are using both ILSVRC2015 DET and VID datasets right. Is there a necessity to have two such datasets.

If no, could you kindly let me know how I can prepare my dataset for training.

Would you suggest I break my video dataset into DET and VID datasets and use DET for training SingleFrameBaseline and VID for training TransVOD Lite?

qianyuzqy commented 1 year ago

Hi, thanks for your interest in our work. In my understanding, it is not necessary to divide your custom dataset into DET and VID. It is just a special division of ILSVRC2015 datasets. For training the single frame baseline, you can use all of the custom datasets for training, but there is no learning of temporal knowledge during this stage. Each frame of the video data is regarded as a separate image for training the image detector. For more details on how to re-organize the customs dataset, I'd suggest you refer to the JSON data we provided in this link: https://drive.google.com/drive/folders/1cCXY41IFsLT-P06xlPAGptG7sc-zmGKF?usp=sharing.

Sairam13001 commented 1 year ago

Ohh, okay. I get it. Thank you for your quick reply.

Sairam13001 commented 1 year ago

I went through the JSON files. Could you let me know what 'Instance ID' means?

qianyuzqy commented 1 year ago

Hi, instance ID is not used in our code in fact. Since we are following MMTracking for preparing the JSON data in this link: https://github.com/open-mmlab/mmtracking/blob/master/tools/convert_datasets/ilsvrc/imagenet2coco_vid.py, our JSON files also include this key. In my understanding, 'Instance ID' is used for multi-object tracking because MMTracking is a unified framework for video object detection and multi-object tracking.

Sairam13001 commented 1 year ago

Understood. Thank you so much.