Closed yanchlu closed 5 years ago
recv_device="/job:localhost/replica:0/task:0/device:CPU:0"
Is it the cause of OOM?
您好,我也遇到了这个问题,请问您最后是如何解决的呢?
把batch_size弄小一点
把batch_size弄小一点
Hey @yanchlu, did you able to solve the issue?
把batch_size弄小一点
Hey @yanchlu, did you able to solve the issue?
I modified the batch_size, but the results were not satisfactory.
@yanchlu Thank you for the quick response.
I have used 3 GPUs to run this program. But it still comes out an OOM.
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[12,12,512,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node bert/encoder/layer_6/attention/self/Softmax (defined at /data2/wangfuyu/NQ/ycl/SMRCToolkit-master/sogou_mrc/libraries/modeling.py:728) = Softmax[T=DT_FLOAT, _class=["loc:@bert/encoder/layer_6/attention/self/cond/Switch_1"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](bert/encoder/layer_6/attention/self/add)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[{{node truediv/_771}} = _Recv[client_terminated=false,recv_device="/job:localhost/replica:0/task:0/device:CPU:0",send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1,tensor_name="edge_5803_truediv", tensor_type=DT_FLOAT,_device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]