xinntao / Real-ESRGAN

Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.
BSD 3-Clause "New" or "Revised" License
28.18k stars 3.54k forks source link

Cuda out of memory when fine tunning #616

Open HenryKang1 opened 1 year ago

HenryKang1 commented 1 year ago

First My image size is 512 x 512. I set my scale as 1 and I also set my gt size as 512. I change the crop 400 padding stuff and set it as 512. The batch size is 1. If I train this with scratch it as "SRVGGNetCompact" architecture. However when I train this as fine tunning it has below error. Any solution? I also can not train the ESRNET also because of CUDA error. I tested the 48 GB GPU memor with batch size 1. IT has still same issue.

raceback (most recent call last): File "realesrgan/train.py", line 11, in train_pipeline(root_path) File "C:\Users\dongh.conda\envs\basicsr\lib\site-packages\basicsr\train.py", line 169, in train_pipeline model.optimize_parameters(current_iter) File "d:\sr_code\real-esrgan\realesrgan\models\realesrgan_model.py", line 210, in optimize_parameters self.output = self.net_g(self.lq) File "C:\Users\dongh.conda\envs\basicsr\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "C:\Users\dongh.conda\envs\basicsr\lib\site-packages\basicsr\archs\rrdbnet_arch.py", line 113, in forward body_feat = self.conv_body(self.body(feat)) File "C:\Users\dongh.conda\envs\basicsr\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "C:\Users\dongh.conda\envs\basicsr\lib\site-packages\torch\nn\modules\container.py", line 139, in forward input = module(input) File "C:\Users\dongh.conda\envs\basicsr\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "C:\Users\dongh.conda\envs\basicsr\lib\site-packages\basicsr\archs\rrdbnet_arch.py", line 59, in forward out = self.rdb1(x) File "C:\Users\dongh.conda\envs\basicsr\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\dongh.conda\envs\basicsr\lib\site-packages\basicsr\archs\rrdbnet_arch.py", line 35, in forward x3 = self.lrelu(self.conv3(torch.cat((x, x1, x2), 1))) RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 24.00 GiB total capacity; 23.00 GiB already allocated; 0 bytes free; 23.09 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

dummyuser-123 commented 8 months ago

have you found any solution for this error ?