Open monkeyDemon opened 4 years ago
True, so I am only using resnet and VGG
I've just updated my code, fixed this bug.
I've tested my updated code on Google Colab
python3.6 pytorch1.6 a K80 gpu:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.57 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
| N/A 45C P8 31W / 149W | 0MiB / 11441MiB | 0% Default |
| | | ERR! |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
here is the output during training(seresnet152, batch_size=64), we can see that the GPU memory consumption is 7832MB, you could try yourself. If you have a different result, plz let me know, thanks. @monkeyDemon @bokveizen
|===========================================================================|
| PyTorch CUDA memory summary, device ID 0 |
|---------------------------------------------------------------------------|
| CUDA OOMs: 0 | cudaMalloc retries: 0 |
|===========================================================================|
| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed |
|---------------------------------------------------------------------------|
| Allocated memory | 772034 KB | 6276 MB | 37813 GB | 37812 GB |
| from large pool | 438784 KB | 5926 MB | 37354 GB | 37354 GB |
| from small pool | 333250 KB | 480 MB | 458 GB | 458 GB |
|---------------------------------------------------------------------------|
| Active memory | 772034 KB | 6276 MB | 37813 GB | 37812 GB |
| from large pool | 438784 KB | 5926 MB | 37354 GB | 37354 GB |
| from small pool | 333250 KB | 480 MB | 458 GB | 458 GB |
|---------------------------------------------------------------------------|
| GPU reserved memory | 7832 MB | 7832 MB | 7832 MB | 0 B |
| from large pool | 7350 MB | 7350 MB | 7350 MB | 0 B |
| from small pool | 482 MB | 482 MB | 482 MB | 0 B |
|---------------------------------------------------------------------------|
| Non-releasable memory | 354366 KB | 1425 MB | 19691 GB | 19690 GB |
| from large pool | 351744 KB | 1423 MB | 19197 GB | 19197 GB |
| from small pool | 2622 KB | 33 MB | 493 GB | 493 GB |
|---------------------------------------------------------------------------|
| Allocations | 2940 | 3808 | 6708 K | 6705 K |
| from large pool | 141 | 549 | 2429 K | 2429 K |
| from small pool | 2799 | 3414 | 4278 K | 4275 K |
|---------------------------------------------------------------------------|
| Active allocs | 2940 | 3808 | 6708 K | 6705 K |
| from large pool | 141 | 549 | 2429 K | 2429 K |
| from small pool | 2799 | 3414 | 4278 K | 4275 K |
|---------------------------------------------------------------------------|
| GPU reserved segments | 499 | 499 | 499 | 0 |
| from large pool | 258 | 258 | 258 | 0 |
| from small pool | 241 | 241 | 241 | 0 |
|---------------------------------------------------------------------------|
| Non-releasable allocs | 106 | 121 | 3384 K | 3384 K |
| from large pool | 44 | 83 | 1058 K | 1058 K |
| from small pool | 62 | 77 | 2325 K | 2325 K |
|===========================================================================|
I found that mobilenet.py has similar problem, which occupies more GPU memory. Can you check it? THX!
I found that googlenet.py also occupies so many gpu memory that when I train it on ImageNet dataset , even 4 gpus with 20GB per gpu are not enough.
I found that googlenet.py also occupies so many gpu memory that when I train it on ImageNet dataset , even 4 gpus with 20GB per gpu are not enough.
Could you please tell me, what is your input image size and batch size during training?
I found that googlenet.py also occupies so many gpu memory that when I train it on ImageNet dataset , even 4 gpus with 20GB per gpu are not enough.
Could you please tell me, what is your input image size and batch size during training?
@weiaicunzai my input image size is 224x224. I tried to set batch size to 128 , 256 and 64, but none of them work.
I found that googlenet.py also occupies so many gpu memory that when I train it on ImageNet dataset , even 4 gpus with 20GB per gpu are not enough.
Could you please tell me, what is your input image size and batch size during training?
@weiaicunzai my input image size is 224x224. I tried to set batch size to 128 , 256 and 64, but none of them work.
Thanks, I will try to reproduce the bug you mentioned. Currently My GPU server is down due to the hardware problems, already sent to repaired, it might take a while, sorry.
I found that googlenet.py also occupies so many gpu memory that when I train it on ImageNet dataset , even 4 gpus with 20GB per gpu are not enough.
Could you please tell me, what is your input image size and batch size during training?
@weiaicunzai my input image size is 224x224. I tried to set batch size to 128 , 256 and 64, but none of them work.
I use 3 downsampling in googlenet, results larger feature map size during training , thats why we have large momery consumption during training. Fewer downsampling is beneficial for small input size like 32x32.I add one more downsampling layer in my googlenet implementation, the GPU memory usage drops from 14GB to 7GB during training on cifar100, but accuracy also drops about 2 percent. If you are going to train the large input image(224x224), you could use 5 times downsampling just as in the original paper, to further reduce the memory usage without losing much network performance.
It seems some of the nets define in models has some hidden bug. For example, I use senet and will get CUDA out of memory error, but my batch_size is only 64, my GPU memory is 11G。
But when I use the model file here https://github.com/moskomule/senet.pytorch/tree/master/senet that only occupy 7G memory when batch_size=90.
I find senet.py resnext.py inceptionv4.py both has similar problem,may be more models.