yunlong10 / Awesome-LLMs-for-Video-Understanding

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
1.22k stars 68 forks source link

Requesting to add a benchmark to this repo - VELOCITI #10

Open varungupta31 opened 2 months ago

varungupta31 commented 2 months ago

Hi there! Thanks for the effort to maintain this amazing repository.

This is a request to add our recent work on evaluation of Video Models. We propose an evaluation benchmark, VELOCITI.

Please find relevant details below,

Title:

VELOCITI: Can Video-Language Models Bind Semantic Concepts Through Time?

About To keep up with the rapid pace with which Video-Language Models (VLM) are being proposed, our primary motivation is to provide a benchmark to evaluate current SoTA, as well as upcoming VLMs on Compositionality, which is a fundamental aspect of vision- language understanding. This is achieved through carefully designed tests, which evaluate various aspects of perception and binding. With this, we aim to provide a more accurate gauge of VLM capabilities, encouraging research towards improving VLMs and preventing shortcomings that may percolate into the systems that rely on such models.

ArXiv https://arxiv.org/abs/2406.10889v1

GitHub https://github.com/katha-ai/VELOCITI

Project Page and Demo https://katha-ai.github.io/projects/velociti/

Please let me know if I missed some required details. Thanks for your time.

sai-01 commented 1 month ago

Thank you so much for sharing this information! We are planning a significant update to our survey soon, and we will include your work in the upcoming revision.