rese1f / MovieChat

[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
https://rese1f.github.io/MovieChat/
BSD 3-Clause "New" or "Revised" License
534 stars 41 forks source link

About Custom Dataset #60

Open foxbeing7 opened 6 months ago

foxbeing7 commented 6 months ago

Nice work! As title states , If I want to build my own dataset , how do I extract visual features and generate image captions and video captions? Thank you

Espere-1119-Song commented 6 months ago

The code is for raw videos, and you can upload the video paths in your own dataset directly

foxbeing7 commented 6 months ago

The code is for raw videos, and you can upload the video paths in your own dataset directly

thanks, if i plan to finetune MovieChat in my own dataset, how should build dataset and any advice for training ?

Espere-1119-Song commented 6 months ago

MovieChat is training free. For more training details, please refer to https://github.com/DAMO-NLP-SG/Video-LLaMA