Closed HuazhangHu closed 1 year ago
In our paper, the query-video branch and the query-caption branch are trained separately. We first train the query-video branch for 5 epochs. Once this branch is trained, we proceed to train the query-caption branch.
在我们的论文中,查询视频分支和查询标题分支是分开训练的。我们首先训练查询视频分支 5 个周期。一旦训练了该分支,我们就继续训练查询标题分支。
我看了你的代码,我发现在train_video.py中就已经使用到了字幕caption,那么此时我该如何理解你所说的前5轮是训练查询-视频分支的?(在我的理解中,你前5个epoch为了训练查询-视频分支,那么就不该出现字幕,因为如果存在字幕,就会导致查询编码器也处理字幕信息了,那么此时不就没有所谓的前五轮训练查询-视频分支的吗?) 我不知道我的理解正确不?我对着一部分很困惑,期望得到你的回复
In paper, you mentioned that “To reduce conflict between the two branches, the query-video branch is trained first, followed by the query-caption branch”。However, you also mentioned that "The total loss L is the sum of Query-Video loss L{QV} and Query-Caption loss L{QC} ". Are the two branches trained separately? My question is that what is the loss when first train query-video branch the loss and secondly train query-caption branch. In addition, how much epochs do it take to train first query-video branch.