[ACL 2024 πŸ”₯] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
Fail to run Video-ChatGPT Demo Offline #50

Closed JuanJia closed 5 months ago

JuanJia commented 10 months ago

Thank you for sharing the good work!

I followed "offline_demo.md" to run offline, but website has no respones.

The terminal shows below. What does line 10 means? What error occurred?

$ python video_chatgpt/demo/video_demo.py --model-name /home/nkd/Documents/jjy/comment_generator/Video-ChatGPT/LLaVA-Lightning-7B-v1-1 --projection_path /home/nkd/Documents/jjy/comment_generator/Video-ChatGPT/video_chatgpt-7B.bin
2023-09-07 14:10:24 | INFO | gradio_web_server | args: Namespace(host='', port=None, controller_url='http://localhost:210001', concurrency_count=8, model_list_mode='once', share=False, moderate=False, embed=False, model_name='/home/nkd/Documents/jjy/comment_generator/Video-ChatGPT/LLaVA-Lightning-7B-v1-1', vision_tower_name='openai/clip-vit-large-patch14', conv_mode='video-chatgpt_v1', projection_path='/home/nkd/Documents/jjy/comment_generator/Video-ChatGPT/video_chatgpt-7B.bin')
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
You are using a model of type llava to instantiate a model of type VideoChatGPT. This is not supported for all configurations of models and can yield errors.
2023-09-07 14:10:30 | ERROR | stderr | 
You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 32006. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc

'NOTE: Please make sure you press the β€˜Upload Video’ button and wait for it to display 'Start Chatting' before submitting question to Video-ChatGPT.' But Start Chatting button always be gray.

2023-09-07 14:10:48 | INFO | stdout | Loading weights from /home/nkd/Documents/jjy/comment_generator/Video-ChatGPT/video_chatgpt-7B.bin
2023-09-07 14:10:49 | INFO | stdout | Weights loaded from /home/nkd/Documents/jjy/comment_generator/Video-ChatGPT/video_chatgpt-7B.bin
2023-09-07 14:10:55 | INFO | stdout | Initialization Finished
2023-09-07 14:10:56 | INFO | stdout | Running on local URL:
2023-09-07 14:14:05 | INFO | gradio_web_server | load_demo.. params: {}
2023-09-07 14:14:18 | INFO | gradio_web_server | add_text. ip:. len: 26
2023-09-07 14:14:19 | ERROR | stderr | RuntimeError: GET was unable to find an engine to execute this computation
2023-09-07 14:15:59 | INFO | stdout | Running on public URL: https://639177a685ea0e6be8.gradio.live
mmaaz60 commented 9 months ago

Hi @JuanJia,

Apologies for the delayed reply. It looks like an issue with transformers version. Please try matching the requirements as in requirements.txt and let me know if the issue persists. Thanks

ashmalvayani commented 8 months ago

@JuanJia I've had a similar issue, please use the localhost link instead of the gradio link, your issue will be resolved.