megvii-research / NAFNet

The state-of-the-art image restoration model without nonlinear activation functions.
Other
2.18k stars 267 forks source link

Not enough memory available to process your request #126

Open clasking2 opened 11 months ago

clasking2 commented 11 months ago

Paste log:

Traceback (most recent call last): File "/root/.pyenv/versions/3.9.12/lib/python3.9/site-packages/cog/server/worker.py", line 217, in _predict result = predict(payload) File "predict.py", line 79, in predict single_image_inference(model, inp, str(out_path)) File "predict.py", line 101, in single_image_inference model.test() File "/src/basicsr/models/image_restoration_model.py", line 247, in test pred = self.net_g(self.lq[i:j]) File "/root/.pyenv/versions/3.9.12/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, *kwargs) File "/src/basicsr/models/archs/NAFNet_arch.py", line 141, in forward x = encoder(x) File "/root/.pyenv/versions/3.9.12/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(input, kwargs) File "/root/.pyenv/versions/3.9.12/lib/python3.9/site-packages/torch/nn/modules/container.py", line 141, in forward input = module(input) File "/root/.pyenv/versions/3.9.12/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, *kwargs) File "/src/basicsr/models/archs/NAFNet_arch.py", line 62, in forward x = self.norm1(x) File "/root/.pyenv/versions/3.9.12/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(input, **kwargs) File "/src/basicsr/models/archs/arch_util.py", line 300, in forward return LayerNormFunction.apply(x, self.weight, self.bias, self.eps) File "/src/basicsr/models/archs/arch_util.py", line 271, in forward var = (x - mu).pow(2).mean(1, keepdim=True) RuntimeError: CUDA out of memory. Tried to allocate 4.27 GiB (GPU 0; 14.58 GiB total capacity; 10.46 GiB already allocated; 213.31 MiB free; 13.40 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

hayk-manukyan-dev commented 10 months ago

Same problem. Is there any solution?

ZhaoYeung commented 10 months ago

Same problem. Is there any solution? I met the same question. I changed the batch size into smaller one, which is 4, soled the problem.

hayk-manukyan-dev commented 10 months ago

Same problem. Is there any solution? I met the same question. I changed the batch size into smaller one, which is 4, soled the problem.

I trying to use test part and in .yml files there is not batch size option. Please can you help with that case

ZhaoYeung commented 10 months ago

Same problem. Is there any solution? I met the same question. I changed the batch size into smaller one, which is 4, soled the problem.

I trying to use test part and in .yml files there is not batch size option. Please can you help with that case

1701761162872

hayk-manukyan-dev commented 10 months ago

Same problem. Is there any solution? I met the same question. I changed the batch size into smaller one, which is 4, soled the problem.

I trying to use test part and in .yml files there is not batch size option. Please can you help with that case

1701761162872

I thought train and test are not run each other and if change train it will have no action on test commands. Now I get you, thank you very much 👍 👍 👍