Hi, thank you for your excellent work!
I noticed in Table 6 that the TGIF Flops=100% baseline accuracy is much lower than reported in the original paper:
Could you please clarify why there is such a significant difference in the baseline accuracy? Additionally, I’d like to know if Fast-V uses the same inference code as the Video-LLaVA repository.
Hi, thank you for your excellent work! I noticed in Table 6 that the TGIF Flops=100% baseline accuracy is much lower than reported in the original paper:
Could you please clarify why there is such a significant difference in the baseline accuracy? Additionally, I’d like to know if Fast-V uses the same inference code as the Video-LLaVA repository.
Thank you in advance!