The performance of Video Swin Transformer Base as a backbone on the Ref-Youtube-VOS dataset without pretraining on RefCOCO

wjn922 / ReferFormer

[CVPR2022] Official Implementation of ReferFormer

Apache License 2.0

322 stars 25 forks source link

The performance of Video Swin Transformer Base as a backbone on the Ref-Youtube-VOS dataset without pretraining on RefCOCO #50

Closed buxiangzhiren closed 6 months ago

buxiangzhiren commented 7 months ago

Thank you for sharing such excellent work. I would like to ask if you have tested the Video Swin Transformer Base as a backbone on the Ref-Youtube-VOS dataset without pretraining on RefCOCO? The results I obtained using your code seem to be similar to those with Video Swin Tiny.

I'm unsure of the cause. It's possible there are some bugs, or the Ref-Youtube-VOS dataset might be too small for effectively fine-tuning the Video Swin Transformer Base.

Thank you for your attention!

wjn922 commented 6 months ago

We did not conduct experiments with the Video-Swin-Base without pre-training on RefCOCO. So sorry that I cannot answer your question. It's possible that the small size of Ref-Youtube-VOS is the main factor.

buxiangzhiren commented 6 months ago

thanks！