Closed shams2023 closed 8 months ago
The parameters of the caption encoder and query encoder are shared. During the training of the query-video branch, we train the parameters of the query encoder, video encoder, and the interaction module. Once this branch is trained, we freeze the query encoder and share its parameters with the caption encoder. We then only train the multi-head attention (MHA) module added after the caption embeddings to obtain the enhanced global caption embedding.
我看了你的代码,我发现在train_video.py中就已经使用到了字幕caption,那么此时我该如何理解你所说的前5轮是训练查询-视频分支的?(在我的理解中,你前5个epoch为了训练查询-视频分支,那么就不该出现字幕,因为如果存在字幕,就会导致查询编码器也处理字幕信息了,那么此时不就没有所谓的前五轮训练查询-视频分支的吗?) 我不知道我的理解正确不?我对着一部分很困惑,期望得到你的回复
Originally posted by @shams2023 in https://github.com/whwu95/Cap4Video/issues/4#issuecomment-1844964707