Closed youthHan closed 2 years ago
I may leave another question here and hope you can kindly answer.
For the joint training on YTB-VOS, does it still need the 12-epoch pretraining process? Or, the model can be trained directly, by the joint dataset of video-like RefCOCO/+/g and YTB-COCO (e.g. for video-swin, only k400 pretrained params are needed). In addition, are all the hyper params keep the same as the one with Ref-COCO pre-trained models.
Hi,
We have updated the pretraining main script. And please refer to this issue https://github.com/wjn922/ReferFormer/issues/7 to start the pretraining. But note that we pretrain the models on 32 V100 GPUs.
For the joint training, the datasets of Ref-YTVOS and Ref-COCO/+/g are mixed so the models are trained directly and do not need the pretraining process. The hyper params are similar to pretraining process except setting --batch_size 1 --num_frames 5 --freeze_text_encoder
.
We have no plans to release the joint training in a short term, the expected released date would be 1~2 months later.
I see that. Thank you for the detailed reply.
Thank you for releasing the codes of ReferFormer and the following update of the pretraining code.
Can you please also release the scripts for the pre-training process? I have tried to use the hyper params mentioned in the paper (like the multi-step LR scheduler). However, the code you released uses a StepLRScheduler rather than a MultiStep one, and the run got stuck and failed. As such, I'm wondering if the script for the released pretraining code needs a special setup. The pretraining process consumes lots of computation resources and I don't want to waste any of the GPU cards. It would be appreciated if you can help with this.
Thanks in advance.