whwu95 / Cap4Video

【CVPR'2023 Highlight & TPAMI】Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
https://arxiv.org/abs/2301.00184
MIT License
225 stars 16 forks source link

Caption encoder and query encoder share weights? #3

Closed HuazhangHu closed 1 year ago

HuazhangHu commented 1 year ago

I am very confused: the caption encoder and query encoder share weights, so what are the optimized parameters for calculating QC matching? Why do we need to pass the capture embedding of CxD through MHA and multiply it with query embedding

whwu95 commented 1 year ago

The parameters of the caption encoder and query encoder are shared. During the training of the query-video branch, we train the parameters of the query encoder, video encoder, and the interaction module. Once this branch is trained, we freeze the query encoder and share its parameters with the caption encoder. We then only train the multi-head attention (MHA) module added after the caption embeddings to obtain the enhanced global caption embedding.