Open asccm opened 9 months ago
As indicated in Figure 1 of the paper, a 16GB capacity may not suffice for processing more than 16 frames. We apply MovieChat on 4090.
Thank you! Could I get some help with the question above.
It seems the temp frame is empty. Is your test.mp4 longer than 1min1sec?
Thanks! You were right.
But still got an issue when running the demo:
Traceback (most recent call last):
File "/.../MovieChat/inference.py", line 372, in
Tried to add '.to(device)' in several places but didn't find a solution. Do you know how to fix this?
it seems 'cur_position_embeddings' and 'frame_hidden_state' are not on the same device. u can use ‘cur_position_embeddings.device' and 'frame_hidden_state.device' to make sure both of them are on cuda:0 or cpu
Thank you! Managed to fix that error.
I run the demo exactly as proposed: python inference.py --cfg-path eval_configs/MovieChat.yaml --gpu-id 0 --num-beams 1 --temperature 1.0 --text-query "What is he doing?" --video-path src/examples/Cooking_cake.mp4 --fragment-video-path src/video_fragment/output.mp4 --cur-min 1 --cur-sec 1 --middle-video 1
with these models: llama_model: "ckpt/llama2/llama-2-7b-hf" llama_proj_model: 'ckpt/minigpt4/pretrained_minigpt4.pth' ckpt: "ckpt/finetune-vicuna7b-v2.pth"
but get the following output:
Moviepy - Done !
Moviepy - video ready src/video_fragment/output.mp4
The question is the first step.##The first step:The following question is the first step.The first step:The following question is the first step.
The answer is off ... do you know how to solve this issue?
when I use the same video and the similar question, I didn't meet this problem. Maybe you should check the version of pretrained_minigpt4.pth and llama. We use llama2-7b-chat as llm decoder, and it doesn't need to merge with vicuna.
While running the inference exactly as recommended on the main page and using a random video test.mp4: 'python inference.py --cfg-path eval_configs/MovieChat.yaml --gpu-id 0 --num-beams 1 --temperature 1.0 --text-query "What is he doing?" --video-path src/examples/test.mp4 --fragment-video-path src/video_fragment/output.mp4 --cur-min 1 --cur-sec 1 --middle-video 1',
it crashes with the following error: 'Traceback (most recent call last): File "MovieChat/inference.py", line 363, in cv2.imwrite(temp_frame_path, frame) cv2.error: OpenCV(4.7.0) /io/opencv/modules/imgcodecs/src/loadsave.cpp:783: error: (-215:Assertion failed) !_img.empty() in function 'imwrite''
Using these models on .yaml: llama_model: "ckpt/Llama-2-7b-hf" llama_proj_model: 'ckpt/minigpt4/pretrained_minigpt4.pth' ckpt: "ckpt/finetune-vicuna7b-v2.pth"
Q: How to resolve this?