Closed a2018c closed 6 years ago
(For other people if get same issue: my workaround is to set batch size to 8)
Hi! You are definitely right. You've got OOM error, because network cannot fit provided memory. To reduce network size you may decrease some parts of it - batch size, growth rate, number of layers, etc.
Hi,
When trying "--growth_rate=12 --depth=100 --dataset=C100", it returned "ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[64,372,32,32] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc"
From the GPU usage, I found that it only used GPU[0] and hit OOM:
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 107... Off | 00000000:01:00.0 On | N/A | | 0% 61C P0 58W / 180W | 7940MiB / 8112MiB | 3% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 107... Off | 00000000:02:00.0 Off | N/A | | 0% 44C P2 41W / 180W | 181MiB / 8114MiB | 0% Default | +-------------------------------+----------------------+----------------------+
How to resolve it? Regards