Closed HJYao00 closed 11 months ago
Thank you for your interest. The quantitative evaluation results improves due to the hyperparameter settings. As you can see in the hyper-parameter ablations part in the paper (Figure 5), the performance of MovieChat degrades when all four are significantly changed. In the paper v1, we just try a group of hyperparameter settings to demonstrate the effectiveness of our approach. In the paper v2, we experiment with a large number of hyperparameter settings and select the best set of data.
Thanks for your quick reply.
:)
Hi, I have two question.
We use QFormer twice as written in paper. line 407 is the definition of video_query_output , and it uses the frame_hidden_state obtained from the first QFormer. For detailed information of QFormer, please refer to our paper or VideoLLaMA.
Hi, thanks for your work. I am curious to know what you did to improve the quantitative evaluation results from v1 to v2. Thanks.