Few-shot video classification

mbzuai-oryx / Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

https://mbzuai-oryx.github.io/Video-ChatGPT

Creative Commons Attribution 4.0 International

1.17k stars 102 forks source link

Few-shot video classification #42

Closed sevakon closed 8 months ago

sevakon commented 1 year ago

Thanks a lot for this exciting work!

I have a general question whether the proposed approach would work well for some sort of few-shot video understanding / classification. From the technical side of things, it should be possible to provide multiple videos with textual description as part of the prompt. I am wondering if the currently trained model would handle the ambiguity of this new, few-shot approach. Have you guys tried anything in "few-shot" direction, or have any intuition if this might work / require some further training?

mmaaz60 commented 12 months ago

Hi @sevakon,

Thank you for your interest in our work and apologies for the delayed response. Technically, Video-ChatGPT may be used for few-shot video understanding/classification using in-context examples. However, we did not explore this application. Please do share any findings that you may have in this regard. Thank You.