mbzuai-oryx / Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
https://mbzuai-oryx.github.io/Video-ChatGPT
Creative Commons Attribution 4.0 International
1.17k stars 102 forks source link

openai/clip-vit-large-patch14 were not used when initializing CLIPVisionModel #43

Closed minuenergy closed 1 year ago

minuenergy commented 1 year ago

image

I running this command python video_chatgpt/demo/video_demo.py \ --model-name Video-ChatGPT_Models/LLaVA-Lightning-7B-v1-1 \ --projection_path Video-ChatGPT_Models/video_chatgpt-7B.bin

It seems to work well with this, but I'm not sure if this is really okay, any advice would be appreciated.

jhj7905 commented 1 year ago

@minuenergy No problem, some weights in picture belong to encoder. Don't worry

mmaaz60 commented 1 year ago

Hi

As pointed by @jhj7905, these warnings are expected. Thanks

chakrabortyrajatsubhra commented 8 months ago

Hi, a followup to this question can we use vit base version instead of large while inferencing and from exactly where do we get the above warning as this not in the code base as far as I saw.