mbzuai-oryx / Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
https://mbzuai-oryx.github.io/Video-ChatGPT
Creative Commons Attribution 4.0 International
1.04k stars 92 forks source link

Code for Dataset Generation #86

Open yeliudev opened 5 months ago

yeliudev commented 5 months ago

Hi @mmaaz60, thanks for sharing this great work!

I was wondering whether you have a plan to share the code for semi-automated dataset generation (the pipeline of using Katna to extract keyframes -> using BLIP2 & GRIT to generate frame-wise captions -> filtering with Tag2Text). If not, is it possible to share the generated dense captions from these large vision models?

Thank you!

sunwhw commented 4 months ago

Me too!

mmaaz60 commented 2 weeks ago

Hi @yeliudev @sunwhw

I appreciate your interest in our work. We recently released our work called VideoGPT+ and an improved semi-automatic video annotation pipeline for dataset generation. All the scripts to run the pipeline are also released.

Please check it out at GitHub, HuggingFace.

Please let me know if you have any questions. Good Luck!