Closed sevakon closed 8 months ago
Hi @sevakon,
Thank you for your interest in our work and apologies for the delayed response. Technically, Video-ChatGPT may be used for few-shot video understanding/classification using in-context examples. However, we did not explore this application. Please do share any findings that you may have in this regard. Thank You.
Thanks a lot for this exciting work!
I have a general question whether the proposed approach would work well for some sort of few-shot video understanding / classification. From the technical side of things, it should be possible to provide multiple videos with textual description as part of the prompt. I am wondering if the currently trained model would handle the ambiguity of this new, few-shot approach. Have you guys tried anything in "few-shot" direction, or have any intuition if this might work / require some further training?