Closed yjhdhr closed 1 week ago
Hi, if you only want to train Ego4D streaming narration, you may need ~12h on 8 A100 GPUs. If training with free-form generated dialogue on Ego4D GoalStep, should be ~18h on 8 A100 GPUs.
This is the cost of 1+3x3 token for each frame. If you only use 1 token for each frame, then 8 x 24G GPU (e.g. A5000, 3090) is enough. We have these models trained but their performance is not so good.
Close this issue. Feel free to reopen it if you have problems on reproducing.
HI,If I want to replicate the training results using ego4d data. How many GPUs do you need approximately? How much time?