tensorflow / models

Models and examples built with TensorFlow
Other
77.16k stars 45.76k forks source link

tcmalloc: large alloc on Colab and Tensorflow killed on local machine due to over consumption of RAM #7652

Open arunumd opened 5 years ago

arunumd commented 5 years ago

System information

  1. Reduce the batch size
  2. Change the optimizer from adam to momentum

However, none of these suggestions helped to solve the problem.

Source code / logs

The error log is very long and hence I am attaching it in a separate text file here : ERROR_LOG.txt

rolba commented 4 years ago

Hello. Be sure that you reduced your bath size well. I had the same issue with my code: https://github.com/rolba/ai-nimals/blob/master/ai_nimals_train_alexnet.py Reducing bath to 32 for generators did the job. Moreover, I paid attention to my RAM memory while training using htop in the console. When SWAP starts to overflow it was a sign for me that I am having a problem with my bath size.

You can find hdf5 generators on my github account. Please check them, use them and let me know if you are still having problems.
Br. Pawel

PrakashSuthar commented 4 years ago

Hello, I get the tcmalloc error very often when trying to run the code on colab from python files ( say train.py ) but the same code(content of train.py copied to cell) when run from the cell gives no such error.I would like to know the cause behind such a behaviour.

ravikyram commented 4 years ago

@arunumd

Is this still an issue?.Please, close this thread if your issue was resolved.Thanks!

arunumd commented 4 years ago

@ravikyram Yes. This is still the same issue

ravikyram commented 4 years ago

@arunumd

Please, let us know which pretrained model you are using and share related code .Thanks!

entorius commented 3 years ago

For example this issue still persists when i try to run https://github.com/dorarad/gansformer this model. I'm using Tensorflow 1.15.0 Google colab on GPU