wjn922 / ReferFormer

[CVPR2022] Official Implementation of ReferFormer
Apache License 2.0
322 stars 25 forks source link

Finetuning on ref-davis17? #33

Open Jay-IPL opened 1 year ago

Jay-IPL commented 1 year ago

nice work. In the paper, it says 'most of our experiments follow the pretrain-then-finetune process.' However, in this github, it says 'As described in the paper, we report the results using the model trained on Ref-Youtube-VOS without finetune.'

did you finetune the pre-trained model on ref-davis17?

wjn922 commented 1 year ago

We did not finetune the pre-trained model on ref-davis17.

Jay-IPL commented 1 year ago

Thanks for the clarification!

I saw the details in the supplementary section. I saw training with a window size of 5. How about inference? Did you use same windows size 36 on all datasets in inference?

wjn922 commented 1 year ago

On Ref-YoutubeVOS and Ref-Davis, the window size is always set as 36 during inference.

On A2D-Sentences and JHMDB-Sentences, the training and inference phases use the same window size (following the practice in MTTR). And we specify the size value in Table 2&3.

Jay-IPL commented 1 year ago

Thanks! Did you have the results finetuning the pre-trained model on ref-davis17? And why did you not finetune the model on ref-davis17 and then report the performance?

wjn922 commented 1 year ago

We have tried to finetune the pre-trained model on ref-davis17 only. The performance would be several points lower than using the pre-trained model directly.

We hypothesize that it is because the ref-davis17 is too small. Maybe finetuning on the combination datasets of ref-youtube and ref-davis17 would be helpful, but we didn't try that.