rese1f / MovieChat

[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
https://rese1f.github.io/MovieChat/
BSD 3-Clause "New" or "Revised" License
534 stars 41 forks source link

About ablation study on memory mechanism #52

Open liziming5353 opened 7 months ago

liziming5353 commented 7 months ago

How is the model without the MM module implemented in the ablation experiment? Is it directly applying the merge algorithm to the entire video?

Espere-1119-Song commented 7 months ago

We just select several frames, feed them into LLM decoder without merge algorithm

liziming5353 commented 7 months ago

Got it. What's the difference between video_path and fragment_video_path? In my understanding, video_path is the path to the video to be processed. But in upload_video_without_audio function in chat_model.py, fragment_video_path is used as a parameter of load_video function.

Espere-1119-Song commented 7 months ago

fragment_video_path stores the video clips read by the sliding window

liziming5353 commented 7 months ago

So need I prepare the video clips in advance or it will be generated automatically?

Espere-1119-Song commented 7 months ago

no needs, it will be generated automatically

liziming5353 commented 7 months ago

Where does it generated? I don't find it. The first time fragment_video_path is used seems to be as a parameter of load_video in upload_video_without_audio function.

liziming5353 commented 7 months ago
image
Espere-1119-Song commented 7 months ago

you can run it and print the path to see:)

liziming5353 commented 7 months ago

I have run it. I set the fragment_video_path to "~/video_frames_moviechat" which is an empty folder. An error occurred:

image
Espere-1119-Song commented 7 months ago

because fragment_video_path needs to be a mp4 file, not a dictionary:)

liziming5353 commented 7 months ago

So fragment_video_path and video_path are the same video?

Espere-1119-Song commented 7 months ago

no, fragment_video_path is a tmp mp4 file

liziming5353 commented 7 months ago

But I only have one video to be processed and you said that fragment_video_path will be generated. So I am confused... Could you give me a sample?

liziming5353 commented 7 months ago

It seems a bug in pypi code. In github code, the capture_video function write the tmp video file, and return the path. But in pypi code, the capture_video function does not write the tmp video file but still return the path. So the error above occurred.