OOM error - Githubissues

leftthomas / ImageDeblurring

A Keras implementation of image deblurring based on ICCV 2017 paper "Deep Generative Filter for motion deblurring"

82 stars 30 forks source link

OOM error #1

Closed tsing90 closed 6 years ago

tsing90 commented 6 years ago

Thanks for sharing your codes. I got stuck when running your code, the error was:

tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2,256,256,640]

The problem was from main.py line-58： d_on_g_loss = d_on_g.train_on_batch(image_blur_batch, [1] * batch_size)

another strange thing was that my gpu (titan xp) was fully loaded but no computation was executed there.

The environment is Win10 + Keras 2.0 + TF1.1

Thanks

leftthomas commented 6 years ago

@tsing90 set batch_size = 1 or set channel_rate < 64, for example channel_rate = 16

tsing90 commented 6 years ago

@leftthomas thanks for your reply. batch_size = 1 threw an error as well ('int' object has no attribute 'shape'), and setting channel_rate =16 didn't give any error but the memory of python kernel kept growing (more than 10G) and never stopped.

leftthomas commented 6 years ago

@tsing90, the occupation of GPU memory is 100% by default when you using TensorFlow as backend, you don't have to worry about that.

tsing90 commented 6 years ago

@leftthomas thanks for your reply again. obviously I know the ram of gpu will be fully occupied which is totally fine, but my problem is the windows system's ram will be fully occupied by the python kernel which resulted in a memory explosion.

leftthomas commented 6 years ago

@tsing90 I haven't encounter that situation on Ubuntu16.04 and macOS, maybe it's only occur on Windows.

tsing90 commented 6 years ago

@leftthomas Thanks for letting me know this, definitely I’ll be trying your codes on Linux. Much appreciated for your reply.

tsing90 commented 6 years ago

@leftthomas Sorry to trouble you once more. I tried on Ubuntu 16.04, but the python kernel still swapped lots of system memory (more than 5G) until it was killed systematically. Could you please double check the codes you uploaded? and tell me how much system memory was used when you run your codes? many thanks

leftthomas commented 6 years ago

@tsing90 I have run my code on 16G i7 CPU and Nvidia GTX 1070 GPU, it’s no error. And the loss function should load VGG16 before compute, so it will using many memory, you could load it on main.py and passing it to loss function, then it will only load once.

tsing90 commented 6 years ago

Thanks a lot. Actually it takes way much memory to compile the computation graph, but when computing in GPU, it’s totally fine as some memory is released. Your reply makes my day, thanks

Best regards, Liuqing/ Kevin

From: Left Thomas Sent: 15 November 2017 02:06 To: leftthomas/ImageDeblurring Cc: tsing90; Mention Subject: Re: [leftthomas/ImageDeblurring] OOM error (#1)

@tsing90 I have run my code on 16G i7 CPU and Nvidia GTX 1070 GPU, it’s no error. And the GAN network really need many memory, you could load train data one by one and free them after using it. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.