Why don't you use top_k and km during the training phase?

hkchengrex / Mask-Propagation

[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.

https://hkchengrex.github.io/MiVOS/

MIT License

128 stars 22 forks source link

Why don't you use top_k and km during the training phase? #18

Closed longmalongma closed 3 years ago

longmalongma commented 3 years ago

Looking at your code I was a little confused why you didn't use top_k and km during the training phase. But top_k and km are used in the evaluation phase, right?Is it bad to use top_k and km in training?

longmalongma commented 3 years ago

Looking at your code I was a little confused why you didn't use top_k and km during the training phase. But top_k and km are used in the evaluation phase, right?Is it bad to use top_k and km in training?

By the way, can the network for evaluation be inconsistent with the network for training?

hkchengrex commented 3 years ago

It is difficult to optimize with top-k/kernel memory due to sparse gradients.
Top-k isn't necessary for training (three frames propagation). Check our motivation in the paper.
What do you mean by inconsistent? Sure they are not the same (can hardly be unless you have tons of training resources and data).

longmalongma commented 3 years ago

It is difficult to optimize with top-k/kernel memory due to sparse gradients.

Top-k isn't necessary for training (three frames propagation). Check our motivation in the paper.

What do you mean by inconsistent? Sure they are not the same (can hardly be unless you have tons of training resources and data).

感谢您的回复，那你的训练模型和stm完全一样吗？inconsistent意思是训练网络和评估网络不是一个网络，假如训练网络和评估网络不是一个网络，在评估时调用训练网络训练好的模型的时候不会因为模型不一致报错吗？

hkchengrex commented 3 years ago

It is similar to but not exactly the same as STM.
The fact that I can train the network and evaluate it without error already shows you the answer.