显存不足的问题 - Githubissues

zychen-ustc / PSD-Principled-Synthetic-to-Real-Dehazing-Guided-by-Physical-Priors

Zeyuan Chen, Yangchao Wang, Yang Yang and Dong Liu. "PSD: Principled Synthetic-to-Real Dehazing Guided by Physical Priors". IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.

MIT License

111 stars 18 forks source link

显存不足的问题 #21

Closed Tennyson0331 closed 2 years ago

Tennyson0331 commented 2 years ago

在运行main.py时（使用FFA），出现了 RuntimeError: CUDA out of memory. Tried to allocate 32.00 MiB (GPU 0; 6.00 GiB total capacity; 4.30 GiB already allocated; 25.12 MiB free; 4.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 在减少了batch_size，num_workers等后仍然如此，将训练集的数量减少至几十张的情况下，仍然如此，请问是训练算法本身的开销就是如此巨大吗，期待您的回复

zychen-ustc commented 2 years ago

您好，训练算法时的显存开销通常较大。通常单卡11G的显存可以将batch_size设为1或2。为了减少显存消耗，可能可以尝试将输入图片的分辨率降低。

Tennyson0331 commented 2 years ago

您好，训练算法时的显存开销通常较大。通常单卡11G的显存可以将batch_size设为1或2。为了减少显存消耗，可能可以尝试将输入图片的分辨率降低。

感谢您的帮助

JOoooooOOOOOoe commented 2 years ago

在运行main.py时（使用FFA），出现了 RuntimeError: CUDA out of memory. Tried to allocate 32.00 MiB (GPU 0; 6.00 GiB total capacity; 4.30 GiB already allocated; 25.12 MiB free; 4.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 在减少了batch_size，num_workers等后仍然如此，将训练集的数量减少至几十张的情况下，仍然如此，请问是训练算法本身的开销就是如此巨大吗，期待您的回复

resize图片128，或者64吧，要不然finetune的时候更加带不动

zychen-ustc commented 2 years ago

另外，实际上我们的框架的网络模型都是基于他人的模型结构。因此如果你能找到一些较为轻量化的去雾网络结构，那么显存的要求也会相应降低。

xiexindan commented 2 years ago

在运行main.py时（使用FFA），出现了 RuntimeError: CUDA out of memory. Tried to allocate 32.00 MiB (GPU 0; 6.00 GiB total capacity; 4.30 GiB already allocated; 25.12 MiB free; 4.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF 在减少了batch_size，num_workers等后仍然如此，将训练集的数量减少至几十张的情况下，仍然如此，请问是训练算法本身的开销就是如此巨大吗，期待您的回复

在服务器上进行训练就可以解决这个问题