i try to train my dataset on a single gpu GTX 1060 6GB,and it break out out of memory aloways at third epoch, if you have any suggestion about how to fix it, very grateful.
2018-06-21 08:53:49.853249: I tensorflow/core/common_runtime/bfc_allocator.cc:686] Stats:
Limit: 5856854016
InUse: 5832717824
MaxInUse: 5845060608
NumAllocs: 2163
MaxAllocSize: 1121255424
2018-06-21 08:53:49.853344: W tensorflow/core/common_runtime/bfc_allocator.cc:277] ****
2018-06-21 08:53:49.853378: W tensorflow/core/framework/op_kernel.cc:1198] Resource exhausted: OOM when allocating tensor with shape[2,50,50,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_call
return fn(*args)
File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1329, in _run_fn
status, run_metadata)
File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2,100,100,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: tower_5/resnet_v1_101_2/block2/unit_4/bottleneck_v1/conv3/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_5/resnet_v1_101_2/block2/unit_4/bottleneck_v1/conv2/Relu, resnet_v1_101/block2/unit_4/bottleneck_v1/conv3/weights/read/_1533)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 265, in
train(args)
File "train.py", line 213, in train
sess_ret = sess.run(sess2run, feed_dict=feed_dict)
File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1128, in _run
feed_dict_tensor, options, run_metadata)
File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1344, in _do_run
options, run_metadata)
File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1363, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2,100,100,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: tower_5/resnet_v1_101_2/block2/unit_4/bottleneck_v1/conv3/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_5/resnet_v1_101_2/block2/unit_4/bottleneck_v1/conv2/Relu, resnet_v1_101/block2/unit_4/bottleneck_v1/conv3/weights/read/_1533)]]
How are you getting tower 4 when you're only running with one GPU?
Can you post your train.py command? Or are you making changes in your training file?
i try to train my dataset on a single gpu GTX 1060 6GB,and it break out out of memory aloways at third epoch, if you have any suggestion about how to fix it, very grateful. 2018-06-21 08:53:49.853249: I tensorflow/core/common_runtime/bfc_allocator.cc:686] Stats: Limit: 5856854016 InUse: 5832717824 MaxInUse: 5845060608 NumAllocs: 2163 MaxAllocSize: 1121255424
2018-06-21 08:53:49.853344: W tensorflow/core/common_runtime/bfc_allocator.cc:277] **** 2018-06-21 08:53:49.853378: W tensorflow/core/framework/op_kernel.cc:1198] Resource exhausted: OOM when allocating tensor with shape[2,50,50,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc Traceback (most recent call last): File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_call return fn(*args) File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1329, in _run_fn status, run_metadata) File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2,100,100,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[Node: tower_5/resnet_v1_101_2/block2/unit_4/bottleneck_v1/conv3/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_5/resnet_v1_101_2/block2/unit_4/bottleneck_v1/conv2/Relu, resnet_v1_101/block2/unit_4/bottleneck_v1/conv3/weights/read/_1533)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "train.py", line 265, in
train(args)
File "train.py", line 213, in train
sess_ret = sess.run(sess2run, feed_dict=feed_dict)
File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1128, in _run
feed_dict_tensor, options, run_metadata)
File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1344, in _do_run
options, run_metadata)
File "/root/.pyenv/versions/3.6.2/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1363, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[2,100,100,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: tower_5/resnet_v1_101_2/block2/unit_4/bottleneck_v1/conv3/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](tower_5/resnet_v1_101_2/block2/unit_4/bottleneck_v1/conv2/Relu, resnet_v1_101/block2/unit_4/bottleneck_v1/conv3/weights/read/_1533)]]