Questions about the value of 'loss_sparse_w' in command

microsoft / SwinBERT

Research code for CVPR 2022 paper "SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning"

https://arxiv.org/abs/2111.13196

MIT License

237 stars 34 forks source link

Questions about the value of 'loss_sparse_w' in command #42

Open tiesanguaixia opened 1 year ago

tiesanguaixia commented 1 year ago

I guess it's the regularization hyperparameter of $Loss_{SPARSE}$ , i.e. the $\lambda$ in your paper. In the appendix, it seems like for MSR-VTT, the model performs best when $\lambda$ = 5. But why the value of 'loss_sparse_w' in command is 0.5? Do we need to adjust it to 5? Thank you!

tiesanguaixia commented 1 year ago

@kevinlin311tw