tensorflow / nmt

TensorFlow Neural Machine Translation Tutorial
Apache License 2.0
6.36k stars 1.96k forks source link

UnimplementedError? #368

Open umiao opened 6 years ago

umiao commented 6 years ago

A problem occurs when i was running this model under linux. I use the following command to run this model: CUDA_VISIBLE_DEVICES=2 cpulimit -l 1000 python3 nmt.py \ --src=cn --tgt=en \ --vocab_prefix=/tmp/nmt_data/vocab \ --batch_size=128 \ --train_prefix=/tmp/nmt_data/train \ --dev_prefix=/tmp/nmt_data/eval \ --infer_batch_size=32 \ --test_prefix=/tmp/nmt_data/test \ --out_dir=/tmp/nmt_model \ --num_train_steps=675000 \ --steps_per_stats=1000 \ --steps_per_external_eval=5000 \ --num_layers=2 \ --num_units=128 \ --dropout=0.2 \ --metrics=bleu

and everytime after a period of 5000 steps is finished, the system throws an exception like this:

347

Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1361, in _do_call return fn(*args) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1340, in _run_fn target_list, status, run_metadata) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.UnimplementedError: TensorArray has size zero, but element shape [?] is not fully defined. Currently only static shapes are supported when packing zero-size TensorArrays. [[Node: dynamic_seq2seq/decoder/decoder/TensorArrayStack_1/TensorArrayGatherV3 = TensorArrayGatherV3[_class=["loc:@dynamic_seq2seq/decoder/decoder/TensorArray_1"], dtype=DT_INT32, element_shape=[?], _device="/job:localhost/replica:0/task:0/device:CPU:0"](dynamic_seq2seq/decoder/decoder/TensorArray_1, dynamic_seq2seq/decoder/decoder/TensorArrayStack_1/range, dynamic_seq2seq/decoder/decoder/while/Exit_2/_109)]] [[Node: dynamic_seq2seq/decoder/decoder/while/TensorArrayWrite_1/TensorArrayWriteV3/_107 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_374_d...rayWriteV3", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

I feel really confused and fail to solve the problem. Can anybody help me with it?

mohammedayub44 commented 5 years ago

@umiao Did you happen to solve this ? I'm also running into this when using 4 GPU's