mikevoets / jama16-retina-replication

JAMA 2016; 316(22) Replication Study
https://doi.org/10.1371/journal.pone.0217541
MIT License
110 stars 35 forks source link

tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM #5

Closed wenwenyu closed 6 years ago

wenwenyu commented 6 years ago

Hi,

I'm sorry about asking this, but I have no idea to solve this problem when run train.py. the following is error message:

System information

Error Message:

Found GPU! Using channels first as default image data format. Traceback (most recent call last): File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1361, in _do_call return fn(*args) File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1340, in _run_fn target_list, status, run_metadata) File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 516, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[32,256,35,35] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[Node: mixed0/concat = ConcatV2[N=4, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](activation_5/Relu, activation_7/Relu, activation_10/Relu, activation_11/Relu, gradients/global_average_pooling2d/Mean_grad/Maximum_1/y)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[Node: GradientDescent/update/_2290 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4395_GradientDescent/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File ".\train.py", line 235, in [global_step, mean_xentropy, train_op, update_brier]) File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 905, in run run_metadata_ptr) File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1137, in _run feed_dict_tensor, options, run_metadata) File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1355, in _do_run options, run_metadata) File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1374, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[32,256,35,35] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[Node: mixed0/concat = ConcatV2[N=4, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](activation_5/Relu, activation_7/Relu, activation_10/Relu, activation_11/Relu, gradients/global_average_pooling2d/Mean_grad/Maximum_1/y)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[Node: GradientDescent/update/_2290 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4395_GradientDescent/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Caused by op 'mixed0/concat', defined at: File ".\train.py", line 133, in include_top=False, weights='imagenet', pooling='avg', input_tensor=x) File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras_impl\keras\applications\inception_v3.py", line 216, in InceptionV3 name='mixed0') File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras_impl\keras\layers\merge.py", line 665, in concatenate return Concatenate(axis=axis, kwargs)(inputs) File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras_impl\keras\engine\topology.py", line 258, in call output = super(Layer, self).call(inputs, kwargs) File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\layers\base.py", line 696, in call outputs = self.call(inputs, *args, **kwargs) File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras_impl\keras\layers\merge.py", line 174, in call return self._merge_function(inputs) File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras_impl\keras\layers\merge.py", line 380, in _merge_function return K.concatenate(inputs, axis=self.axis) File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras_impl\keras\backend.py", line 2083, in concatenate return array_ops.concat([to_dense(x) for x in tensors], axis) File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\array_ops.py", line 1175, in concat return gen_array_ops._concat_v2(values=values, axis=axis, name=name) File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 777, in _concat_v2 "ConcatV2", values=values, axis=axis, name=name) File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3271, in create_op op_def=op_def) File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1650, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[32,256,35,35] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[Node: mixed0/concat = ConcatV2[N=4, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](activation_5/Relu, activation_7/Relu, activation_10/Relu, activation_11/Relu, gradients/global_average_pooling2d/Mean_grad/Maximum_1/y)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[Node: GradientDescent/update/_2290 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4395_GradientDescent/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

wenwenyu commented 6 years ago

modify system information : TensorFlow version: 1.6

wenwenyu commented 6 years ago

Hi I modify train_batch_size = 8 val_batch_size = 8 and shuffle_buffer_size = 32 then it works.

mikevoets commented 6 years ago

ResourceExhaustedError means that you should decrease the batch size. But you found that out already. I'll close this issue for now.