microsoft / SwinBERT

Research code for CVPR 2022 paper "SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning"
https://arxiv.org/abs/2111.13196
MIT License
237 stars 34 forks source link

Question for training process #2

Closed WangLanxiao closed 2 years ago

WangLanxiao commented 2 years ago

Thanks for your contributions. When I train model based on the help of readme, I meet this question: ######################################### SwinBERT/src/modeling/load_bert.py", line 12, in get_bert_model config.img_feature_type = 'frcnn' AttributeError: 'NoneType' object has no attribute 'img_feature_type' ######################################### When I debug this code, I find it might need models/bert-base-uncased/ this data. Is this error caused by missing some files?

Thanks again for your contribution

kevinlin311tw commented 2 years ago

Thanks for pointing it out! Just added now. It is the standard bert config from huggingface.

WangLanxiao commented 2 years ago

Thanks for your reply! By the way, I find that the URL for MSRVTT in table1 and table-32-frame model are same. It might be a error URL in the second table.

[URL](https://datarelease.blob.core.windows.net/swinbert/models/msrvtt-table1.zip)
kevinlin311tw commented 2 years ago

They should be the same. The best performing one on MSRVTT is based on 32-frame setting.

tiesanguaixia commented 1 year ago

Thanks for your reply! By the way, I find that the URL for MSRVTT in table1 and table-32-frame model are same. It might be a error URL in the second table.

[URL](https://datarelease.blob.core.windows.net/swinbert/models/msrvtt-table1.zip)

Hi! Have you reproduced the results in paper? May I ask did you adjust the value of 'loss_sparse_w' and the 'learning_rate' in command? For the 'loss_sparsew', I guess it's the regularization hyperparameter of $Loss{SPARSE}$ , i.e. the $\lambda$ in the paper. In the appendix, it seems like for MSR-VTT, the model performs best when $\lambda$ = 5. But the why the default value of 'loss_sparse_w' in command is 0.5? Do I need to adjust it to 5? Thank you a lot!