Open KerolosAtef opened 11 months ago
@KerolosAtef @avinash31d , Thank you for your interest in our work. Please find the details about the Vicuna-based quantitative evaluation benchmark here: https://github.com/mbzuai-oryx/Video-LLaVA/tree/main/quantitative_evaluation.
thank you very much, but also the Vicuna model doesn't output the same results for each run.
I have tried to reproduce some of the results of video chat GPT and this the results: ActivityNet : Acc :36.13 instead of 40.8 TGIF: Acc: 63.07 instead of 66.5
@KerolosAtef We attribute this to the randomness introduced by the temperature parameter in both the tested model and the LLM used for evaluation. This will be addressed in our future work.
okay good, I want to make sure of something, for the Zeroshot datasets (MSVD, MSR-VTT,Activity_net,TGIF) Are you used the testing data or the validation data?
We follow the same approach as Video-ChatGPT, i.e. using test splits.
+1