huangb23 / VTimeLLM

[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".
https://arxiv.org/pdf/2311.18445.pdf
Other
205 stars 11 forks source link

About Video Feature Project #35

Open xiaokj37 opened 1 month ago

xiaokj37 commented 1 month ago

First of all, thank you very much for open-sourcing your work. According to your paper, VTimeLLm project the image cls token in to LLM embedding. I would like to ask where this part is implemented in the code. Looking forward to your reply.

huangb23 commented 3 weeks ago

For training, we pre-extract the cls embedding of each frame and project it using the mm_projector in the class VTimeLLMMetaModel. The relevant code can be found in model/vtimellm_arch.py. Additionally, you can refer to inference.py for the code related to extracting the cls embedding.