CUDA out of memory - Githubissues

honeytidy commented 2 years ago

got the following error when running the Replicate demo (Debluring):

CUDA out of memory. Tried to allocate 5.41 GiB (GPU 0; 14.76 GiB total capacity; 9.14 GiB already allocated; 4.45 GiB free; 9.34 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

BTW, colab get the same error.

mayorx commented 2 years ago

Hi, honeytidy,

I run the deblurring demo (720x1280) in Replicate / Colab, and it seems no problem. The 'out of GPU memory' might cause by the large image size?

honeytidy commented 2 years ago

I tried it again, and it seemed all right. But it doesn't seem to work well with my test image. I paste the result for Replicate demo:

mayorx commented 2 years ago

Hi, honeytidy,

Please try the test image in our colab demo It might get better results.

I'm working on the replicate demo but it may take some time.

xiaohulihutu commented 2 years ago

@mayorx

Hi, i guess its better to ask in this "issue" I am having the same issue with CUDA out of memory. I am hundred percent sure it is caused by large image size. I have tried small resolution images and they work fine.

But I have a server with 4 x T4 GPUs and i have adjusted num_gpu in the yml file. num_gpu: 4 # set num_gpu: 0 for cpu mode

But seems still only one GPU is used. Anything else i need to adjust?

Thank you so much for your help!

mayorx commented 2 years ago

Hi, @xiaohulihutu, For single image inference, using only one GPU is what we expected. May I ask about the image size in this "cuda out of memory" case?

A workaround is to crop the image into patches, restore each patch, and then stitch the patches into a whole image. It could be accomplished in this framework by modifying the testing config: 1). switch the grids from false into true 2). add two parameters, crop_size_w, crop_size_h after the parameter grids, it may look like

val:
  save_img: true
  grids: true
  crop_size_h: 512
  crop_size_w: 512
  .... (other parameters, e.g. metrics)

xiaohulihutu commented 2 years ago

@mayorx Thank you for your fast response. The image reso is 3024*4032 It is taken by an iphone.

I will try your crop to patches method and see how it works. Thank you very much!

prov3it commented 1 year ago

Hi, @xiaohulihutu, For single image inference, using only one GPU is what we expected. May I ask about the image size in this "cuda out of memory" case?

A workaround is to crop the image into patches, restore each patch, and then stitch the patches into a whole image. It could be accomplished in this framework by modifying the testing config: 1). switch the grids from false into true 2). add two parameters, crop_size_w, crop_size_h after the parameter grids, it may look like
val:
  save_img: true
  grids: true
  crop_size_h: 512
  crop_size_w: 512
  .... (other parameters, e.g. metrics)

Got the exact same issue. Image size: 6680 x 4441

Disable distributed.
 load net keys <built-in method keys of dict object at 0x7fa4298c4900>
2022-12-09 18:04:27,421 INFO: Model [ImageRestorationModel] is created.
Traceback (most recent call last):
  File "basicsr/demo.py", line 61, in <module>
    main()
  File "basicsr/demo.py", line 49, in main
    model.test()
  File "/home/xxx/NAFNet/basicsr/models/image_restoration_model.py", line 247, in test
    pred = self.net_g(self.lq[i:j])
  File "/home/xxx/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xxx/NAFNet/basicsr/models/archs/NAFNet_arch.py", line 136, in forward
    x = self.intro(inp)
  File "/home/xxx/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xxx/.local/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/xxx/.local/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 7.09 GiB (GPU 0; 2.00 GiB total capacity; 939.42 MiB already allocated; 0 bytes free; 958.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I tried the above. Got a new error:

Disable distributed.
 load net keys <built-in method keys of dict object at 0x7fb2f818d6c0>
2022-12-09 18:05:53,378 INFO: Model [ImageRestorationModel] is created.
Traceback (most recent call last):
  File "basicsr/demo.py", line 61, in <module>
    main()
  File "basicsr/demo.py", line 47, in main
    model.grids()
  File "/home/xxx/NAFNet/basicsr/models/image_restoration_model.py", line 110, in grids
    b, c, h, w = self.gt.size()
AttributeError: 'ImageRestorationModel' object has no attribute 'gt'

mexthecat commented 1 year ago

run into the same problem.

after adding

  crop_size_h: 256
  crop_size_w: 256

I got this error

Traceback (most recent call last):
  File "basicsr/test.py", line 70, in <module>
    main()
  File "basicsr/test.py", line 61, in main
    model.validation(
  File "basicsr/models/base_model.py", line 55, in validation
    return self.dist_validation(dataloader, current_iter, tb_logger, save_img, rgb2bgr, use_image)
  File "basicsr/models/image_restoration_model.py", line 285, in dist_validation
    self.grids_inverse()
  File "basicsr/models/image_restoration_model.py", line 184, in grids_inverse
    preds[0, :, i: i + crop_size_h, j: j + crop_size_w] += self.outs[cnt]
AttributeError: 'ImageRestorationModel' object has no attribute 'outs'

coudn't find any outs except in model.test

   def test(self):
        self.net_g.eval()
        with torch.no_grad():
            n = len(self.lq)
            outs = []
            m = self.opt['val'].get('max_minibatch', n)
            i = 0
            while i < n:
                j = i + m
                if j >= n:
                    j = n
                pred = self.net_g(self.lq[i:j])
                if isinstance(pred, list):
                    pred = pred[-1]
                outs.append(pred.detach().cpu())
                i = j

            self.output = torch.cat(outs, dim=0)
        self.net_g.train()

after changing to

    def test(self):
        self.net_g.eval()
        with torch.no_grad():
            n = len(self.lq)
            self.outs = []
            m = self.opt['val'].get('max_minibatch', n)
            i = 0
            while i < n:
                j = i + m
                if j >= n:
                    j = n
                pred = self.net_g(self.lq[i:j])
                if isinstance(pred, list):
                    pred = pred[-1]
                self.outs.append(pred.detach().cpu())
                i = j

            self.output = torch.cat(self.outs, dim=0)
        self.net_g.train()

I got this error

Traceback (most recent call last):
  File "basicsr/test.py", line 70, in <module>
    main()
  File "basicsr/test.py", line 61, in main
    model.validation(
  File "basicsr/models/base_model.py", line 55, in validation
    return self.dist_validation(dataloader, current_iter, tb_logger, save_img, rgb2bgr, use_image)
  File "basicsr/models/image_restoration_model.py", line 285, in dist_validation
    self.grids_inverse()
  File "basicsr/models/image_restoration_model.py", line 184, in grids_inverse
    preds[0, :, i: i + crop_size_h, j: j + crop_size_w] += self.outs[cnt]
RuntimeError: output with shape [3, 256, 256] doesn't match the broadcast shape [4, 3, 256, 256]

I have to admit I have no idea what I'm doing here - Any help would be greatly appreciated.

avmm9898 commented 1 year ago

@mexthecat It works by those changs:

1.change NAFNet-width32.yml

val:
  save_img: true
  grids: true
  crop_size_h: 512
  crop_size_w: 512

2.add 'gt' in basicsr\demo.py at line 44

model.feed_data(data={'lq': img.unsqueeze(dim=0),'gt': img.unsqueeze(dim=0)})

3.change 'out' to 'output' in basicsr\models\image_restoration_model.py at line 183

preds[0, :, i: i + crop_size_h, j: j + crop_size_w] += self.output[cnt]

replace m by a smaller value

    def test(self):
        self.net_g.eval()
        with torch.no_grad():
            n = len(self.lq)
            outs = []
            m = self.opt['val'].get('max_minibatch', n)
            m = 1  #set m here
            i = 0
            while i < n:
                j = i + m
                if j >= n:
                    j = n
                pred = self.net_g(self.lq[i:j])
                if isinstance(pred, list):
                    pred = pred[-1]
                outs.append(pred.detach().cpu())
                i = j

            self.output = torch.cat(outs, dim=0)
        self.net_g.train()

If those modifications are still shows out of memory, try set this in ternimal, set a smaller max_split_size_mb set PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512

Now it should be no problem.

megvii-research / NAFNet

CUDA out of memory #11