rese1f / MovieChat

[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
https://rese1f.github.io/MovieChat/
BSD 3-Clause "New" or "Revised" License
534 stars 41 forks source link

How to Run the Demo #42

Open asccm opened 9 months ago

asccm commented 9 months ago

While running the inference exactly as recommended on the main page and using a random video test.mp4: 'python inference.py --cfg-path eval_configs/MovieChat.yaml --gpu-id 0 --num-beams 1 --temperature 1.0 --text-query "What is he doing?" --video-path src/examples/test.mp4 --fragment-video-path src/video_fragment/output.mp4 --cur-min 1 --cur-sec 1 --middle-video 1',

it crashes with the following error: 'Traceback (most recent call last): File "MovieChat/inference.py", line 363, in cv2.imwrite(temp_frame_path, frame) cv2.error: OpenCV(4.7.0) /io/opencv/modules/imgcodecs/src/loadsave.cpp:783: error: (-215:Assertion failed) !_img.empty() in function 'imwrite''

Using these models on .yaml: llama_model: "ckpt/Llama-2-7b-hf" llama_proj_model: 'ckpt/minigpt4/pretrained_minigpt4.pth' ckpt: "ckpt/finetune-vicuna7b-v2.pth"

Q: How to resolve this?

Espere-1119-Song commented 9 months ago

As indicated in Figure 1 of the paper, a 16GB capacity may not suffice for processing more than 16 frames. We apply MovieChat on 4090.

asccm commented 9 months ago

Thank you! Could I get some help with the question above.

Espere-1119-Song commented 9 months ago

It seems the temp frame is empty. Is your test.mp4 longer than 1min1sec?

asccm commented 9 months ago

Thanks! You were right.

But still got an issue when running the demo:

Traceback (most recent call last): File "/.../MovieChat/inference.py", line 372, in msg = chat.upload_video_without_audio( File "/.../MovieChat/inference.py", line 277, in upload_video_without_audio videoemb, = self.model.encode_long_video(cur_image, middle_video) File "/.../MovieChat/MovieChat/models/moviechat.py", line 366, in encode_long_video frame_hidden_state = cur_position_embeddings + frame_hidden_state RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Tried to add '.to(device)' in several places but didn't find a solution. Do you know how to fix this?

Espere-1119-Song commented 9 months ago

it seems 'cur_position_embeddings' and 'frame_hidden_state' are not on the same device. u can use ‘cur_position_embeddings.device' and 'frame_hidden_state.device' to make sure both of them are on cuda:0 or cpu

asccm commented 9 months ago

Thank you! Managed to fix that error.

I run the demo exactly as proposed: python inference.py --cfg-path eval_configs/MovieChat.yaml --gpu-id 0 --num-beams 1 --temperature 1.0 --text-query "What is he doing?" --video-path src/examples/Cooking_cake.mp4 --fragment-video-path src/video_fragment/output.mp4 --cur-min 1 --cur-sec 1 --middle-video 1

with these models: llama_model: "ckpt/llama2/llama-2-7b-hf" llama_proj_model: 'ckpt/minigpt4/pretrained_minigpt4.pth' ckpt: "ckpt/finetune-vicuna7b-v2.pth"

but get the following output: Moviepy - Done !
Moviepy - video ready src/video_fragment/output.mp4 The question is the first step.##The first step:The following question is the first step.The first step:The following question is the first step.

The answer is off ... do you know how to solve this issue?

Espere-1119-Song commented 9 months ago

when I use the same video and the similar question, I didn't meet this problem. Maybe you should check the version of pretrained_minigpt4.pth and llama. We use llama2-7b-chat as llm decoder, and it doesn't need to merge with vicuna.