Open Jay-IPL opened 1 year ago
We did not finetune the pre-trained model on ref-davis17.
Thanks for the clarification!
I saw the details in the supplementary section. I saw training with a window size of 5. How about inference? Did you use same windows size 36 on all datasets in inference?
On Ref-YoutubeVOS and Ref-Davis, the window size is always set as 36 during inference.
On A2D-Sentences and JHMDB-Sentences, the training and inference phases use the same window size (following the practice in MTTR). And we specify the size value in Table 2&3.
Thanks! Did you have the results finetuning the pre-trained model on ref-davis17? And why did you not finetune the model on ref-davis17 and then report the performance?
We have tried to finetune the pre-trained model on ref-davis17 only. The performance would be several points lower than using the pre-trained model directly.
We hypothesize that it is because the ref-davis17 is too small. Maybe finetuning on the combination datasets of ref-youtube and ref-davis17 would be helpful, but we didn't try that.
nice work. In the paper, it says 'most of our experiments follow the pretrain-then-finetune process.' However, in this github, it says 'As described in the paper, we report the results using the model trained on Ref-Youtube-VOS without finetune.'
did you finetune the pre-trained model on ref-davis17?