mbzuai-oryx / Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
https://mbzuai-oryx.github.io/Video-ChatGPT
Creative Commons Attribution 4.0 International
1.05k stars 92 forks source link

Adding Modality #104

Closed OdinfromAsgard closed 2 weeks ago

OdinfromAsgard commented 1 month ago

Hi nice work if I want to add more modalities like audio tokens or pose tokens of a video what are the changes I need to make to incorporate these? Or just video is supported in this codebase?

mmaaz60 commented 2 weeks ago

Hi @OdinfromAsgard,

I appreciate your interest in our work. Currently, this codebase only supports video features.