Open ebranda opened 5 years ago
I'm getting the same error.
i am also getting the same error.
i am also getting the same error.
I solved this problem by reducing the batch size, number of iterations etc.
我也遇到同样的错误。
我通过减少批处理,迭代次数等来解决了这个问题。
请问您跑通了么 这个代码
i am also getting the same error.
I solved this problem by reducing the batch ,size number of iterations etc.
I m getting same error. Can you please exactly what all changes have you made??
我也遇到同样的错误。
我通过减少批处理,迭代次数等来解决了这个问题。
请问您跑通了么 这个代码
@xiaowangzi6668 i m still getting same error. Can you please tell me exactly what changes to be made? Thanks
i am also getting the same error.
I solved this problem by reducing the batch ,size number of iterations etc.
i reduced batch size and iteration number still getting the error. Can you please tell exactly what changes you made. Thanks
@manvirvirk Resource exhausted error literally means you used all possible RAM space in your local environment. Try training in a better environment. It will work if you set the batch size small as an extreme size as 2.
@manvirvirk Resource exhausted error literally means you used all possible RAM space in your local environment. Try training in a better environment. It will work if you set the batch size small as an extreme size as 2.
Nope. Its not like that. I am using google colab pro with 25 GB of RAM (even ram is not fully occupied), still i got this error.
Actually its depending on your GPU architecture. Internally GPU contains different type of cores like TF32, FP64. When these cores are not enough to get assigned work (threads) by CUDA we got OOM (Out of Memory) error.
@ebranda Soultion -> buy new a GPU (one or multiple) with large number of CUDA cores or to reduce batch size to that extend till this error get resolve. [batch size like 128, 64, 32, 16, 8, 4, 2].
NOTE: Reducing batch-size may highly effect your model performance as it is said that bigGAN give better results with large batch size (that's why default batch-size is 2048).
Thanks for contributing this. About a minute into each training run I am receiving the following error, after which the program exits: (1) Resource exhausted: OOM when allocating tensor with shape[256,192,64,64] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Also, when initializing, the program reports the following: [] Reading checkpoints... [] Failed to find a checkpoint [!] Load failed... But continues to run.
I have reduced batch_size to 256 and img_size to 128 and error persists. Running Tensorflow version 1.14.0.
Any ideas?