OOM in pretraining - Githubissues

linjieli222 / HERO

Research code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"

https://arxiv.org/abs/2005.00200

MIT License

230 stars 34 forks source link

OOM in pretraining #30

Closed hgzjy25 closed 2 years ago

hgzjy25 commented 3 years ago

I tried to pretrain HERO mode from scratch in HowTo100M and TV datasets, and the code worked well at the begining, but crashed after thousands of iterations. I found that the memory usage was growing in training and finally out of memory. Have you met this problem?

Liu0329 commented 3 years ago

I also encounter the same problem. @linjieli222

linjieli222 commented 3 years ago

@Liu0329 @hgzjy25

I have received similar reports about this issue. However, we did not met the same issue during our experiments. You may need to search online for potential solutions, sorry for any inconvenience. If you do find a solution, please also come back and post it here helping other people in need.

linjieli222 commented 3 years ago

One potential direction, check if the memory increasing is due to caching. If so, you can force to clean the cache periodically.