swz30 / MPRNet

[CVPR 2021] Multi-Stage Progressive Image Restoration. SOTA results for Image deblurring, deraining, and denoising.
Other
1.18k stars 192 forks source link

RuntimeError: CUDA out of memory #13

Closed uV3301 closed 3 years ago

uV3301 commented 3 years ago

While trying/testing a pretrained model with the cmd:

python demo.py --task Deblurring --input_dir path_to_images --result_dir save_images_here

I am getting some runtime error. So basically, the terminal shows this

RuntimeError: CUDA out of memory. Tried to allocate 2.15 GiB (GPU 0; 1.96 GiB total capacity; 420.11 MiB already allocated; 400.00 MiB free; 233.89 MiB cached)

I am guessing this is most likely because my GPU's memory is occupied and is not sufficient for some allocation that the program tries during runtime. Since you can see that the allocation size if already greater than the total capacity, so could you suggest or recommend some methods or change in the code for reducing this allocation size. Thanks

adityac8 commented 3 years ago

Hi @uV3301

Our model won't work on GPUs with 2 GB memory. You can either run it on CPU or use a graphic card with more memory.

Thanks

J9nZHANG commented 3 years ago

While I use 1080Ti to run demo.py (deblur task), out of memory also come up

uV3301 commented 3 years ago

@adityac8 So like, could you suggests some edits we could do to run the program on CPU, because as of now I guess cudotools is a must to run it.

fhdbbk commented 3 years ago

Hi @uV3301

Our model won't work on GPUs with 2 GB memory. You can either run it on CPU or use a graphic card with more memory.

Thanks

Hi @adityac8,

I tried to run it on colab and there too it is throwing memory error. How much memory is sufficient for running this?

I am getting the following error. RuntimeError: CUDA out of memory. Tried to allocate 1.00 GiB (GPU 0; 15.90 GiB total capacity; 13.58 GiB already allocated; 367.75 MiB free; 1.30 GiB cached)

J9nZHANG commented 3 years ago

I try to test on 3090 ,it works.About 22GB memory is used for test.

fhdbbk commented 3 years ago

Oh! that is a lot for me at present. Thanks for the update!

adityac8 commented 3 years ago

Hi,

We have tested our deblurring model for GoPro images of resolution 1280x720 on a Titan XP with 12 GB memory.

Thanks

WuYang-WY commented 3 years ago

When I ran the deblur on a 12GB 2080Ti, the maximum batch_size could only be 1

bmccord2 commented 3 years ago

I ran out of memory on the Deblurring model with a 1080ti, I do have about 213 MiB of memory allocated to gnome, but nothing else. Image was 1280x1024

Traceback (most recent call last):
  File "demo.py", line 74, in <module>
    restored = model(input_)
  File "/home/bmccord/virtual_envs/pytorch1/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "Deblurring/MPRNet.py", line 343, in forward
    x3_cat = self.stage3_orsnet(x3_cat, feat2, res2)
  File "/home/bmccord/virtual_envs/pytorch1/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "Deblurring/MPRNet.py", line 225, in forward
    x = self.orb1(x)
  File "/home/bmccord/virtual_envs/pytorch1/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "Deblurring/MPRNet.py", line 197, in forward
    res = self.body(x)
  File "/home/bmccord/virtual_envs/pytorch1/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/bmccord/virtual_envs/pytorch1/lib64/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/home/bmccord/virtual_envs/pytorch1/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "Deblurring/MPRNet.py", line 54, in forward
    res = self.body(x)
  File "/home/bmccord/virtual_envs/pytorch1/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/bmccord/virtual_envs/pytorch1/lib64/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/home/bmccord/virtual_envs/pytorch1/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/bmccord/virtual_envs/pytorch1/lib64/python3.6/site-packages/torch/nn/modules/activation.py", line 961, in forward
    return F.prelu(input, self.weight)
  File "/home/bmccord/virtual_envs/pytorch1/lib64/python3.6/site-packages/torch/nn/functional.py", line 1121, in prelu
    return torch.prelu(input, weight)
RuntimeError: CUDA out of memory. Tried to allocate 640.00 MiB (GPU 0; 10.91 GiB total capacity; 9.04 GiB already allocated; 259.50 MiB free; 845.21 MiB cached)
swz30 commented 3 years ago

Hi all,

You can free up the memory of redundant variables right after each stage. For example, Add the following at L309

if not self.training:
    # print(f"Is model in training mode? {self.training}")
    del stage1_img_top, stage1_img_bot, stage1_img, feat1_ltop, feat1_rtop, feat1_lbot, feat1_rbot, x1ltop, x1rtop, x1lbot, x1rbot
    torch.cuda.empty_cache()

The following at L332

if not self.training:
    del stage2_img, x2top, x2bot, x2top_cat, x2bot_cat, feat2_top, feat2_bot
    torch.cuda.empty_cache()

And finally, return this

 return [stage3_img+x3_img] if not self.training else [stage3_img+x3_img, stage2_img, stage1_img]
uV3301 commented 3 years ago

I try to test on 3090 ,it works.About 22GB memory is used for test.

Damn, this model requires too much. I don't think I can run it on my machine then. Or maybe if I could allocate on CPU instead of GPU.

uV3301 commented 3 years ago

Hi all,

You can free up the memory of redundant variables right after each stage. For example, Add the following at L309

if not self.training:
    # print(f"Is model in training mode? {self.training}")
    del stage1_img_top, stage1_img_bot, stage1_img, feat1_ltop, feat1_rtop, feat1_lbot, feat1_rbot, x1ltop, x1rtop, x1lbot, x1rbot
    torch.cuda.empty_cache()

The following at L332

if not self.training:
    del stage2_img, x2top, x2bot, x2top_cat, x2bot_cat, feat2_top, feat2_bot
    torch.cuda.empty_cache()

And finally, return this

 return [stage3_img+x3_img] if not self.training else [stage3_img+x3_img, stage2_img, stage1_img]

I will try to implement this. Thanks

adityac8 commented 3 years ago

@adityac8 So like, could you suggests some edits we could do to run the program on CPU, because as of now I guess cudotools is a must to run it.

If you want to run our model on CPU, remove cuda() from https://github.com/swz30/MPRNet/blob/51b58bb2ec803162e9053c1269b170009ee6f693/Deblurring/test.py#L40 https://github.com/swz30/MPRNet/blob/51b58bb2ec803162e9053c1269b170009ee6f693/Deblurring/test.py#L57

Thanks

BLOO123 commented 3 years ago

@adityac8 So like, could you suggests some edits we could do to run the program on CPU, because as of now I guess cudotools is a must to run it.

If you want to run our model on CPU, remove cuda() from https://github.com/swz30/MPRNet/blob/51b58bb2ec803162e9053c1269b170009ee6f693/Deblurring/test.py#L40

https://github.com/swz30/MPRNet/blob/51b58bb2ec803162e9053c1269b170009ee6f693/Deblurring/test.py#L57

Thanks

I have tried this method but the CUDA OOM error is still displayed. Could it be due to line 64 of the demo.py with the .cuda() ? How do i work around this? Thank you!