whwu95 / FreeVA

FreeVA: Offline MLLM as Training-Free Video Assistant
Apache License 2.0
40 stars 0 forks source link

Memory consumption #4

Closed TicTacToePP closed 3 months ago

TicTacToePP commented 3 months ago

Hi, I find your paper interesting. However, as a newbie in this field, I am wondering the T (number of frames) used here. Does each dataset samples the same T which are always small? and how much GPU memory will the proposed dense aggregation consumes over prior sparse methods? Thx.

whwu95 commented 3 months ago

Thank you for your interest in my work. The memory requirements for FreeVA w/ LLaVA-1.5 are as follows:

Method MLLM Frames GPU Memory MSVD-QA Acc.
FreeVA S1 LLaVA-1.5 7B 4 16.1G 69.6
FreeVA D1 LLaVA-1.5 7B 4 19.1G 70.5
FreeVA D1 LLaVA-1.5 13B 4 32.0G 71.1
FreeVA D2 LLaVA-1.5 13B 8 32.0G 71.8

GPT-3.5-Turbo-0613 API is employed for above evaluation.