hkchengrex / MiVOS

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion. Semi-supervised VOS as well!
https://hkchengrex.com/MiVOS/
MIT License
469 stars 64 forks source link

Temporal Information #33

Closed FrancescaCi closed 2 years ago

FrancescaCi commented 2 years ago

Hi, I am interested in your project and I would like to go in detail for an aspect related to temporal information. Are you training your model on video datasets? Are you getting temporal information from the dataset? or your model has been trained on single images considering only spatial information?

Thank you so much. Best, Francesca

hkchengrex commented 2 years ago

We trained it on DAVIS, YouTubeVOS, and BL30K which are all video datasets. Static images are used for pre-training only.

FrancescaCi commented 2 years ago

Thank you so much. So if I would likes ti re-train tour model what do you suggest? To apply the Same methodology of pre-training with static images and then with videos? What was the advantage of the pre-training with static images?

hkchengrex commented 2 years ago

Unless you have more data that you would like to include, I would suggest training it in the same way. For the benefits of pre-training, you can check STM out: https://arxiv.org/abs/1904.00607.