zychen-ustc / PSD-Principled-Synthetic-to-Real-Dehazing-Guided-by-Physical-Priors

Zeyuan Chen, Yangchao Wang, Yang Yang and Dong Liu. "PSD: Principled Synthetic-to-Real Dehazing Guided by Physical Priors". IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
MIT License
111 stars 18 forks source link

显存不足的问题 #21

Closed Tennyson0331 closed 2 years ago

Tennyson0331 commented 2 years ago

在运行main.py时(使用FFA),出现了 RuntimeError: CUDA out of memory. Tried to allocate 32.00 MiB (GPU 0; 6.00 GiB total capacity; 4.30 GiB already allocated; 25.12 MiB free; 4.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 在减少了batch_size,num_workers等后仍然如此,将训练集的数量减少至几十张的情况下,仍然如此,请问是训练算法本身的开销就是如此巨大吗,期待您的回复

zychen-ustc commented 2 years ago

您好,训练算法时的显存开销通常较大。通常单卡11G的显存可以将batch_size设为1或2。为了减少显存消耗,可能可以尝试将输入图片的分辨率降低。

Tennyson0331 commented 2 years ago

您好,训练算法时的显存开销通常较大。通常单卡11G的显存可以将batch_size设为1或2。为了减少显存消耗,可能可以尝试将输入图片的分辨率降低。

感谢您的帮助

JOoooooOOOOOoe commented 2 years ago

在运行main.py时(使用FFA),出现了 RuntimeError: CUDA out of memory. Tried to allocate 32.00 MiB (GPU 0; 6.00 GiB total capacity; 4.30 GiB already allocated; 25.12 MiB free; 4.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 在减少了batch_size,num_workers等后仍然如此,将训练集的数量减少至几十张的情况下,仍然如此,请问是训练算法本身的开销就是如此巨大吗,期待您的回复

resize图片128,或者64吧,要不然finetune的时候更加带不动

zychen-ustc commented 2 years ago

另外,实际上我们的框架的网络模型都是基于他人的模型结构。因此如果你能找到一些较为轻量化的去雾网络结构,那么显存的要求也会相应降低。

xiexindan commented 2 years ago

在运行main.py时(使用FFA),出现了 RuntimeError: CUDA out of memory. Tried to allocate 32.00 MiB (GPU 0; 6.00 GiB total capacity; 4.30 GiB already allocated; 25.12 MiB free; 4.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 在减少了batch_size,num_workers等后仍然如此,将训练集的数量减少至几十张的情况下,仍然如此,请问是训练算法本身的开销就是如此巨大吗,期待您的回复

在服务器上进行训练就可以解决这个问题