tensorflow / nmt

TensorFlow Neural Machine Translation Tutorial
Apache License 2.0
6.37k stars 1.96k forks source link

ResourceExhaustedError #303

Closed yapingzhao closed 5 years ago

yapingzhao commented 6 years ago

Hi, I use the following command for model training. python -m nmt.nmt --src=mn --tgt=zh --vocab_prefix=nmt/tmp/nmt_data/vocab -- train_prefix=nmt/tmp/nmt_data/train.mn-zh --dev_prefix=nmt/tmp/nmt_data/dev.mn-zh -- test_prefix=nmt/tmp/nmt_data/test --out_dir=nmt/tmp/nmt_model --num_train_steps=1200 -- steps_per_stats=100 --num_layers=2 --num_units=128 --batch_size=128 --dropout=0.2 -- metrics=bleu

However, the output error message: ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape [6272,30000] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[Node: dynamic_seq2seq/decoder/output_projection/Tensordot/MatMul = MatMul [T=DT_FLOAT, _class=["loc:@gradi...d/MatMul_1"], transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"] (dynamic_seq2seq/decoder/output_projection/Tensordot/Reshape, dynamic_seq2seq/decoder/output_projection/kernel/read)]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Looking forward to your advice or answers. Best regards,

yapingzhao

tuvuumass commented 6 years ago

It seems that you did not have enough memory on your GPU. Try to increase the memory for your job.

yapingzhao commented 6 years ago

Thank you very much!

ptamas88 commented 6 years ago

I was in the same situation, and i changed the memory allocaiton to a fixed value. Really dont understand the mechanics behind the script but since then i havent faced this error.

kanghj commented 5 years ago

@yapingzhao can I ask if you managed to solve the ResourceExhaustedError? What did you need to change for it to work?