linjieli222 / HERO

Research code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"
https://arxiv.org/abs/2005.00200
MIT License
230 stars 34 forks source link

Pre-training based on HowTo100M dataset #17

Closed Unified-Robots closed 3 years ago

Unified-Robots commented 3 years ago

As you have done in your paper, the videos of HowTo100M are segmented into 60s clips. I also processed the caption.json of this dataset to match the segmeted clips. When I pre-trained the model, the error "cuda out of memory" occurs? I guess there are two many subtitles in HowTo100M. How to solve this problem?

linjieli222 commented 3 years ago

You can try lower the batch size. For the released pre-trained weights, the experiments are done on 16x 32GB V100 GPUs and we did not encounter "cuda out of memory" with the provided config.

Remember that if you lower the batch size, you will need to either increase the gradient accumulation steps or the total number of training steps.

Hope it helps. Thanks.

Unified-Robots commented 3 years ago

@linjieli222 Thanks for your reply. We will check our code to process the caption.json file of HowTo100M.

linjieli222 commented 3 years ago

Closed due to inactivity.