mbzuai-oryx / Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
https://mbzuai-oryx.github.io/Video-ChatGPT
Creative Commons Attribution 4.0 International
1.05k stars 92 forks source link

On the code of generating “pred_dir” #60

Closed wyzjack closed 8 months ago

wyzjack commented 8 months ago

Hi authors,

Congrats on your nice work! In your code here (https://github.com/mbzuai-oryx/Video-ChatGPT/blob/cb6f2259065c3b2036f3aefc4ca411726235f797/data/generate_instruction_qa_semi_automatic.py#L23C107-L24C21) you need to load the extracted context information from the raw videos. Could you provide the code for generating the contents in "pred_dir"?

Many thanks in advance!

hanoonaR commented 8 months ago

Hi @wyzjack,

Thank you for your interest in our work!

We haven't made the code for generating pred_dir publicly available. However, you can get an understanding of the process from Section 4.2 of our paper. In summary:

Note on Settings:

Hope this helps. Feel free to ask if you have more specific questions, such as hyperparameters or other details.

wyzjack commented 8 months ago

Got it, thanks so much for your reply and information!