wjn922 / ReferFormer

[CVPR2022] Official Implementation of ReferFormer
Apache License 2.0
322 stars 25 forks source link

Issue w.r.t pretraining models #11

Closed youthHan closed 2 years ago

youthHan commented 2 years ago

Thank you for releasing the codes of ReferFormer and the following update of the pretraining code.

Can you please also release the scripts for the pre-training process? I have tried to use the hyper params mentioned in the paper (like the multi-step LR scheduler). However, the code you released uses a StepLRScheduler rather than a MultiStep one, and the run got stuck and failed. As such, I'm wondering if the script for the released pretraining code needs a special setup. The pretraining process consumes lots of computation resources and I don't want to waste any of the GPU cards. It would be appreciated if you can help with this.

Thanks in advance.

youthHan commented 2 years ago

I may leave another question here and hope you can kindly answer.

For the joint training on YTB-VOS, does it still need the 12-epoch pretraining process? Or, the model can be trained directly, by the joint dataset of video-like RefCOCO/+/g and YTB-COCO (e.g. for video-swin, only k400 pretrained params are needed). In addition, are all the hyper params keep the same as the one with Ref-COCO pre-trained models.

wjn922 commented 2 years ago

Hi,

We have updated the pretraining main script. And please refer to this issue https://github.com/wjn922/ReferFormer/issues/7 to start the pretraining. But note that we pretrain the models on 32 V100 GPUs.

For the joint training, the datasets of Ref-YTVOS and Ref-COCO/+/g are mixed so the models are trained directly and do not need the pretraining process. The hyper params are similar to pretraining process except setting --batch_size 1 --num_frames 5 --freeze_text_encoder. We have no plans to release the joint training in a short term, the expected released date would be 1~2 months later.

youthHan commented 2 years ago

I see that. Thank you for the detailed reply.