[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
Great project. I saw the frame selection code you wrote, which is an average selection. However, the questions and answers of several frames in the json file are not the corresponding questions and answers of 100 frames. What problems will there be in doing this, or is it reasonable to do so? 😄 :
Great project. I saw the frame selection code you wrote, which is an average selection. However, the questions and answers of several frames in the json file are not the corresponding questions and answers of 100 frames. What problems will there be in doing this, or is it reasonable to do so? 😄 :