ikhlestov / vision_networks

Repo about neural networks for images handling
MIT License
264 stars 122 forks source link

ResourceExhaustedError: OOM when allocating tensor #25

Closed a2018c closed 6 years ago

a2018c commented 6 years ago

Hi,

When trying "--growth_rate=12 --depth=100 --dataset=C100", it returned "ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[64,372,32,32] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc"

From the GPU usage, I found that it only used GPU[0] and hit OOM:

| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 107... Off | 00000000:01:00.0 On | N/A | | 0% 61C P0 58W / 180W | 7940MiB / 8112MiB | 3% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 107... Off | 00000000:02:00.0 Off | N/A | | 0% 44C P2 41W / 180W | 181MiB / 8114MiB | 0% Default | +-------------------------------+----------------------+----------------------+

How to resolve it? Regards

a2018c commented 6 years ago

(For other people if get same issue: my workaround is to set batch size to 8)

ikhlestov commented 6 years ago

Hi! You are definitely right. You've got OOM error, because network cannot fit provided memory. To reduce network size you may decrease some parts of it - batch size, growth rate, number of layers, etc.