rese1f / MovieChat

[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
https://rese1f.github.io/MovieChat/
BSD 3-Clause "New" or "Revised" License
524 stars 41 forks source link

Training details #6

Closed wanghao-cst closed 1 year ago

wanghao-cst commented 1 year ago

Awesome work! Will you share the training or fintuning code?

rese1f commented 1 year ago

Thank you for your insterest, please see the closed issue #2
"Our method is training-free, you can implement this mechanism in any model."

wanghao-cst commented 1 year ago

Thank you for your insterest, please see the closed issue #2 "Our method is training-free, you can implement this mechanism in any model."

Thank you for the reply. May I know what is the baseline of model part? It seems like MiniGPT-4. In the paper it illustrates a lot of MLLM.

rese1f commented 1 year ago

We build our model based on video-llama, since it is a simple but strong video-based MLLM.

wanghao-cst commented 1 year ago

Thank you.