[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
We are in the process of making pretrained models and the code ready for release. It should be released withing one month. Stay tuned and enjoy the demo!
Can you please share some finetuning recipe for custom dataset?