Run long video - Githubissues

rese1f / MovieChat

[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

https://rese1f.github.io/MovieChat/

BSD 3-Clause "New" or "Revised" License

534 stars 41 forks source link

Run long video #53

Open shuyansy opened 7 months ago

shuyansy commented 7 months ago

Thanks for your work. I want to test my long video and set the middle_video to 0. However, i met this error. Could you help me to provide some solutions? Thanks so much!

截屏2024-04-26 下午9 24 11

Espere-1119-Song commented 7 months ago

I think I have fixed the bug before. Can u tell me which code u use? The error often occurs when u didn't initialize the long-term memory when update a new long video.

shuyansy commented 7 months ago

Hi, I just run the main branch. Moreover, i found "# middle_video = middle_video == 1" In line 368 in inference.py, So I delete it Do you have any ideas. Thanks so much for your reply!

Espere-1119-Song commented 7 months ago

I think line 368 is not commented out.

shuyansy commented 7 months ago

Thanks so much！ The code can run normally. However, I want to read more frames so I adjust " N_SAMPLES = 128", when I use larger number, the error still occurs. Do you have any ideas? Thanks for your time and response!

Espere-1119-Song commented 7 months ago

I think you need to check the actual length of positional embedding.

shuyansy commented 7 months ago

Thanks for the response. I expand "n_position = 16" to" n_position = 32" so it can support more frames. However I found the max value is 32 due to the size of the pretrained model. Thus, the max frame to model is about 500. Is my understanding right?

Espere-1119-Song commented 7 months ago

I think you can explore it by checking the length of long-term memory :). The max frame read by the model is len(short-term memory) * len(long-term memory) / 2

shuyansy commented 7 months ago

Thanks for reminder! The last question I want to check is whether my hyperparameter setting is right: cur_sec=1 cur_min=1 If I want to read the whole long video.