showlab / videollm-online

VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
Apache License 2.0
84 stars 14 forks source link

What is the cost of reproducing the results? #8

Closed yjhdhr closed 1 week ago

yjhdhr commented 1 week ago

HI,If I want to replicate the training results using ego4d data. How many GPUs do you need approximately? How much time?

chenjoya commented 1 week ago

Hi, if you only want to train Ego4D streaming narration, you may need ~12h on 8 A100 GPUs. If training with free-form generated dialogue on Ego4D GoalStep, should be ~18h on 8 A100 GPUs.

chenjoya commented 1 week ago

This is the cost of 1+3x3 token for each frame. If you only use 1 token for each frame, then 8 x 24G GPU (e.g. A5000, 3090) is enough. We have these models trained but their performance is not so good.

chenjoya commented 1 week ago

Close this issue. Feel free to reopen it if you have problems on reproducing.