can I simply query the model to locate the `highlight moment or the best moment` in the video?

huangb23 / VTimeLLM

[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".

https://arxiv.org/pdf/2311.18445.pdf

Other

226 stars 11 forks source link

can I simply query the model to locate the `highlight moment or the best moment` in the video? #17

Closed dragen1860 closed 7 months ago

dragen1860 commented 8 months ago

HI, Dear author: thanks for publishing such an insightful work. After reading your paper, i realise the training overload is really small, single GPU will be able to reproduce your work, which is really neat.
I would like to use your model to detect the highlight moment on my own video. I wonder how does it perform? thank you.

huangb23 commented 8 months ago

I haven't attempted to perform a highlight detection task with this model before. Perhaps you might need to fine-tune it using similar data processing methods. Of course, you could also try directly performing it in a zero-shot manner. Feel free to share your results!

dragen1860 commented 7 months ago

I haven't attempted to perform a highlight detection task with this model before. Perhaps you might need to fine-tune it using similar data processing methods. Of course, you could also try directly performing it in a zero-shot manner. Feel free to share your results!

HI, I try your model on my own videos. To be honest, somethiems the timestamp is right but the description is not precise; sometimes the description is correct but the timestamp is far away from groudtruth. A big margin exists between the open source and Gemini 1.5. Anyway, you still made a good job, thank you.