I guess it's the regularization hyperparameter of $Loss_{SPARSE}$ , i.e. the $\lambda$ in your paper. In the appendix, it seems like for MSR-VTT, the model performs best when $\lambda$ = 5. But why the value of 'loss_sparse_w' in command is 0.5? Do we need to adjust it to 5? Thank you!
I guess it's the regularization hyperparameter of $Loss_{SPARSE}$ , i.e. the $\lambda$ in your paper. In the appendix, it seems like for MSR-VTT, the model performs best when $\lambda$ = 5. But why the value of 'loss_sparse_w' in command is 0.5? Do we need to adjust it to 5? Thank you!