Open yanghoonkim opened 7 years ago
the error above happened with tensorflow 1.2 While I re-installed to 1.0, I got a similar error:
INFO:tensorflow:Creating ZeroBridge in mode=eval
INFO:tensorflow:
ZeroBridge: {}
INFO:tensorflow:Starting evaluation at 2017-06-30-06:18:48
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN Black, pci bus id: 0000:02:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX TITAN Black, pci bus id: 0000:03:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:2) -> (device: 2, name: GeForce GTX TITAN Black, pci bus id: 0000:83:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:3) -> (device: 3, name: GeForce GTX TITAN Black, pci bus id: 0000:84:00.0)
W tensorflow/core/framework/op_kernel.cc:993] Out of range: Reached limit of 1
[[Node: dev_input_fn/parallel_read/filenames/limit_epochs/CountUpTo = CountUpTo[T=DT_INT64, _class=["loc:@dev_input_fn/parallel_read/filenames/limit_epochs/epochs"], limit=1, _device="/job:localhost/replica:0/task:0/cpu:0"](dev_input_fn/parallel_read/filenames/limit_epochs/epochs)]]
W tensorflow/core/framework/op_kernel.cc:993] Out of range: Reached limit of 1
[[Node: dev_input_fn/parallel_read_1/filenames/limit_epochs/CountUpTo = CountUpTo[T=DT_INT64, _class=["loc:@dev_input_fn/parallel_read_1/filenames/limit_epochs/epochs"], limit=1, _device="/job:localhost/replica:0/task:0/cpu:0"](dev_input_fn/parallel_read_1/filenames/limit_epochs/epochs)]]
Traceback (most recent call last):
File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 82, in __call__
ret = func(*args)
File "seq2seq/metrics/metric_specs.py", line 156, in _py_func
return self.metric_fn(sliced_hypotheses, sliced_references) #pylint: disable=E1102
File "seq2seq/metrics/metric_specs.py", line 181, in metric_fn
return bleu.moses_multi_bleu(hypotheses, references, lowercase=False)
File "seq2seq/metrics/bleu.py", line 79, in moses_multi_bleu
bleu_cmd, stdin=read_pred, stderr=subprocess.STDOUT)
File "/home/ad26kr/miniconda2/lib/python2.7/subprocess.py", line 212, in check_output
process = Popen(stdout=PIPE, *popenargs, **kwargs)
File "/home/ad26kr/miniconda2/lib/python2.7/subprocess.py", line 390, in __init__
errread, errwrite)
File "/home/ad26kr/miniconda2/lib/python2.7/subprocess.py", line 1024, in _execute_child
raise child_exception
OSError: [Errno 8] Exec format error
W tensorflow/core/framework/op_kernel.cc:993] Internal: Failed to run py callback pyfunc_0: see error log.
Traceback (most recent call last):
File "/home/ad26kr/miniconda2/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/home/ad26kr/miniconda2/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/ad26kr/utils/seq2seq/bin/train.py", line 277, in <module>
tf.app.run()
File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/home/ad26kr/utils/seq2seq/bin/train.py", line 272, in main
schedule=FLAGS.schedule)
File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 106, in run
return task()
File "seq2seq/contrib/experiment.py", line 112, in continuous_train_and_eval
hooks=self._eval_hooks)
File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 280, in new_func
return func(*args, **kwargs)
File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 514, in evaluate
log_progress=log_progress)
File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 836, in _evaluate_model
hooks=hooks)
File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/training/python/training/evaluation.py", line 430, in evaluate_once
session.run(eval_ops, feed_dict)
File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 462, in run
run_metadata=run_metadata)
File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 786, in run
run_metadata=run_metadata)
File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 744, in run
return self._sess.run(*args, **kwargs)
File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 891, in run
run_metadata=run_metadata)
File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 744, in run
return self._sess.run(*args, **kwargs)
File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 767, in run
run_metadata_ptr)
File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 965, in _run
feed_dict_string, options, run_metadata)
File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
target_list, options, run_metadata)
File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Failed to run py callback pyfunc_0: see error log.
[[Node: bleu/value = PyFunc[Tin=[DT_STRING, DT_STRING], Tout=[DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](bleu/Identity, bleu/Identity_1)]]
[[Node: bleu/value/_349 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_714_bleu/value", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
Caused by op u'bleu/value', defined at:
File "/home/ad26kr/miniconda2/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/home/ad26kr/miniconda2/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/ad26kr/utils/seq2seq/bin/train.py", line 277, in <module>
tf.app.run()
File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/home/ad26kr/utils/seq2seq/bin/train.py", line 272, in main
schedule=FLAGS.schedule)
File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 106, in run
return task()
File "seq2seq/contrib/experiment.py", line 112, in continuous_train_and_eval
hooks=self._eval_hooks)
File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 280, in new_func
return func(*args, **kwargs)
File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 514, in evaluate
log_progress=log_progress)
File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 810, in _evaluate_model
eval_ops = self._get_eval_ops(features, labels, metrics)
File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1195, in _get_eval_ops
metrics, features, labels, model_fn_ops.predictions))
File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 258, in _make_metrics_ops
result[name] = metric.create_metric_ops(features, labels, predictions)
File "seq2seq/metrics/metric_specs.py", line 124, in create_metric_ops
name="value")
File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 189, in py_func
input=inp, token=token, Tout=Tout, name=name)
File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_script_ops.py", line 40, in _py_func
name=name)
File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/ad26kr/miniconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1226, in __init__
self._traceback = _extract_stack()
InternalError (see above for traceback): Failed to run py callback pyfunc_0: see error log.
[[Node: bleu/value = PyFunc[Tin=[DT_STRING, DT_STRING], Tout=[DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](bleu/Identity, bleu/Identity_1)]]
[[Node: bleu/value/_349 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_714_bleu/value", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
looks similar to #77(The first 1000 training steps are good. But then the evaluation failed) but not the same problem
Have you solved the problem? My dear friends. I met the same error Now.
@Lavine24 It looks like this repository won't be updated at least these days. I heard that early version of tf-seq2seq works well, but I don't know which one is. you may refer to tensorflow nmt tutorial here: https://github.com/tensorflow/nmt
What I ran were following lines, which is exactly the same example on the tutorial page(nmt)
ad26kr@ubuntu:~/utils/seq2seq$ python -m bin.train --config_paths=" ./example_configs/nmt_small.yml, ./example_configs/train_seq2seq.yml, ./example_configs/text_metrics_bpe.yml" --model_params " vocab_source: $VOCAB_SOURCE vocab_target: $VOCAB_TARGET" --input_pipeline_train " class: ParallelTextInputPipeline params: source_files:
$MODEL_DIR
However, I got errors (please scroll down to the bottom of lines) And I can't find out any solution with related to this problem.