Closed tsing90 closed 6 years ago
@tsing90 set batch_size = 1 or set channel_rate < 64, for example channel_rate = 16
@leftthomas thanks for your reply. batch_size = 1 threw an error as well ('int' object has no attribute 'shape'), and setting channel_rate =16 didn't give any error but the memory of python kernel kept growing (more than 10G) and never stopped.
@tsing90, the occupation of GPU memory is 100% by default when you using TensorFlow as backend, you don't have to worry about that.
@leftthomas thanks for your reply again. obviously I know the ram of gpu will be fully occupied which is totally fine, but my problem is the windows system's ram will be fully occupied by the python kernel which resulted in a memory explosion.
@tsing90 I haven't encounter that situation on Ubuntu16.04 and macOS, maybe it's only occur on Windows.
@leftthomas Thanks for letting me know this, definitely I’ll be trying your codes on Linux. Much appreciated for your reply.
@leftthomas Sorry to trouble you once more. I tried on Ubuntu 16.04, but the python kernel still swapped lots of system memory (more than 5G) until it was killed systematically. Could you please double check the codes you uploaded? and tell me how much system memory was used when you run your codes? many thanks
@tsing90 I have run my code on 16G i7 CPU and Nvidia GTX 1070 GPU, it’s no error. And the loss function should load VGG16 before compute, so it will using many memory, you could load it on main.py and passing it to loss function, then it will only load once.
Thanks a lot. Actually it takes way much memory to compile the computation graph, but when computing in GPU, it’s totally fine as some memory is released. Your reply makes my day, thanks
Best regards, Liuqing/ Kevin
From: Left Thomas Sent: 15 November 2017 02:06 To: leftthomas/ImageDeblurring Cc: tsing90; Mention Subject: Re: [leftthomas/ImageDeblurring] OOM error (#1)
@tsing90 I have run my code on 16G i7 CPU and Nvidia GTX 1070 GPU, it’s no error. And the GAN network really need many memory, you could load train data one by one and free them after using it. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Thanks for sharing your codes. I got stuck when running your code, the error was:
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2,256,256,640]
The problem was from main.py line-58: d_on_g_loss = d_on_g.train_on_batch(image_blur_batch, [1] * batch_size)
another strange thing was that my gpu (titan xp) was fully loaded but no computation was executed there.
The environment is Win10 + Keras 2.0 + TF1.1
Thanks