Closed VincentChong123 closed 5 years ago
Hi @liamcli ,
for your cifar10_model.pt, using Pytorch1.1 and Gtx2080ti, I managed to run cnn/test.py for Cifar10 using batch 56.
Only a single GPU is required. NOTE: PyTorch 0.4 is not supported at this moment and would lead to OOM.
for training, is it possible to use torch.nn.parallel to solve OOM ?
Only a single GPU is required. NOTE: PyTorch 0.4 is not supported at this moment and would lead to OOM.
Do you mean OOM below? I wonder can Pytorch 0.4 or multiple-GPU solve error below.
for PyTorch 1.1, cuda10.0, cudnn7.x, train.py --auxiliary --cutout <----use default batch 96 for Cifar10, @80% GPU memory utilization ratio
06/17 01:31:31 PM param size = 3.349342MB 06/17 01:31:31 PM Model total parameters: 3825768
06/17 01:31:33 PM train 000 3.308731e+00 9.375000 40.625000 06/17 01:31:49 PM train 050 3.192118e+00 12.357026 57.107841 ... 06/17 01:34:20 PM train 500 2.565969e+00 31.686626 82.054638 06/17 01:34:26 PM train_acc 32.043999
File "/opt/venv/usr-python/python3.6/tf-nightly-gpu/lib/python3.6/site-packages/torch/nn/functional.py", line 1697, in batch_norm training, momentum, eps, torch.backends.cudnn.enabled RuntimeError: CUDA out of memory. Tried to allocate 14.00 MiB (GPU 0; 10.73 GiB total capacity; 9.35 GiB already allocated; 12.56 MiB free; 554.67 MiB cached)
BTW, thanks sharing the great talk! https://slideslive.com/38916590/random-search-and-reproducibility-for-neural-architecture-search?locale=cs
This is a fork of the code available at https://github.com/quark0/darts
If the OOM issue happens during evaluation (inference only) then you can look into using "torch.no_grad()" instead of the volatile argument.
Thanks @liamcli !
Hi @liamcli,
Thanks for sharing your works.
Above OOM refers to error below?
darts/cnn/test.py:86: UserWarning: volatile was removed and now has no effect. Use
with torch.no_grad():
instead. target = Variable(target, volatile=True).cuda(async=True) 06/17 12:18:42 PM test 000 1.233735e-01 96.875000 100.000000 ... File "/opt/venv/usr-python/python3.6/tf-nightly-gpu/lib/python3.6/site-packages/torch/nn/functional.py", line 1697, in batch_norm training, momentum, eps, torch.backends.cudnn.enabled RuntimeError: CUDA out of memory. Tried to allocate 14.00 MiB (GPU 0; 10.73 GiB total capacity; 9.37 GiB already allocated; 6.56 MiB free; 533.84 MiB cached)My system uses PyTorch 1.1, cuda10.0, cudnn7.x
Thank you.