tensorflow / tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Apache License 2.0
15.5k stars 3.49k forks source link

Error in training speech recognition #1283

Closed manuel3265 closed 5 years ago

manuel3265 commented 5 years ago

Description

I followed this tutorial: https://cloud.google.com/tpu/docs/tutorials/automated-speech-recognition I already did the part of: t2t-datagen, but when I do the part of: t2t-trainer, the following error appears:

Error recorded from training_loop: NodeDef mentions attr 'use_inter_op_parallelism' not in Op<name= ParallelMapDataset; signature=input_dataset:variant, other_arguments:, num_parallel_calls:int32 -> handle:variant; attr=f:func; attr=Targuments:list(type),min=0; attr=output_types:list(type),min=1; attr=output_shapes:list(shape),m in=1>; NodeDef: node input_pipeline_task0/ParallelMapDataset (defined at /usr/local/lib/python2.7/dist-packages/ten sorflow/contrib/tpu/python/tpu/tpu_estimator.py:3043) = ParallelMapDataset[Targuments=[], _class=["loc:@input_pipe line_task0/MakeIterator"], f=tf_data_structured_function_wrapper_7BUKT3wHiUM[], output_shapes=[[1], [?,80,3], [?]], output_types=[DT_INT64, DT_FLOAT, DT_INT64], use_inter_op_parallelism=true, _device="/job:worker/replica:0/task:0/ device:CPU:0"](input_pipeline_task0/ParallelInterleaveDataset, input_pipeline_task0/num_parallel_calls). (Check whe ther your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.). INFO:tensorflow:training_loop marked as finished

...

Environment information

I'm running it in a google virtual machine, with debian, (4 vCPUs, 15 GB of memory)

OS: Debian GNU/Linux 9.6 (stretch)

$ pip freeze | grep tensor
mesh-tensorflow==0.0.4
tensor2tensor==1.11.0
tensorboard==1.12.0
tensorflow==1.12.0
tensorflow-metadata==0.9.0
tensorflow-probability==0.5.0

$ python -V
Python 2.7.13

For bugs: reproduction and error logs

# Steps to reproduce:
...
# Error logs:
INFO:tensorflow:training_loop marked as finished
WARNING:tensorflow:Reraising captured error
Traceback (most recent call last):
  File "/usr/local/bin/t2t-trainer", line 33, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "/usr/local/bin/t2t-trainer", line 28, in main
    t2t_trainer.main(argv)
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/bin/t2t_trainer.py", line 387, in main
    execute_schedule(exp)
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/bin/t2t_trainer.py", line 349, in execute_schedule
    getattr(exp, FLAGS.schedule)()
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_lib.py", line 438, in continuous_train_a
nd_eval
    self._eval_spec)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 471, in train_and_eva
luate
    return executor.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 637, in run
    getattr(self, task_to_run)()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 647, in run_worker
    return self._start_distributed_training()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 788, in _start_distri
buted_training
    saving_listeners=saving_listeners)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2409, in t
rain
    rendezvous.raise_errors()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/error_handling.py", line 128, in r
aise_errors
    six.reraise(typ, value, traceback)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2403, in t
rain
    saving_listeners=saving_listeners
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 354, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 1207, in _train_mode
l
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 1241, in _train_mode
l_default
    saving_listeners)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 1468, in _train_with
_estimator_spec
    log_step_count_steps=log_step_count_steps) as mon_sess:
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 504, in Monit
oredTrainingSession
    stop_grace_period_secs=stop_grace_period_secs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 921, in __ini
t__
    stop_grace_period_secs=stop_grace_period_secs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 643, in __ini
t__
    self._sess = _RecoverableSession(self._coordinated_creator)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1107, in __in
it__
    _WrappedSession.__init__(self, self._create_session())
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1112, in _cre
ate_session
    return self._sess_creator.create_session()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 807, in creat
e_session
    hook.after_create_session(self.tf_sess, self.coord)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/util.py", line 147, in after_create_sess
ion
    session.run(self._initializers)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: NodeDef mentions attr 'use_inter_op_parallelism' not 
in Op<name=ParallelMapDataset; signature=input_dataset:variant, other_arguments:, num_parallel_calls:int32 -> handl
e:variant; attr=f:func; attr=Targuments:list(type),min=0; attr=output_types:list(type),min=1; attr=output_shapes:li
st(shape),min=1>; NodeDef: node input_pipeline_task0/ParallelMapDataset (defined at /usr/local/lib/python2.7/dist-p
ackages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py:3043)  = ParallelMapDataset[Targuments=[], _class=["loc:
@input_pipeline_task0/MakeIterator"], f=tf_data_structured_function_wrapper_7BUKT3wHiUM[], output_shapes=[[1], [?,8
0,3], [?]], output_types=[DT_INT64, DT_FLOAT, DT_INT64], use_inter_op_parallelism=true, _device="/job:worker/replic
a:0/task:0/device:CPU:0"](input_pipeline_task0/ParallelInterleaveDataset, input_pipeline_task0/num_parallel_calls).
 (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
...
manuel3265 commented 5 years ago

this error is due to the tensorflow version. in my case, the tensorflow version is 1.12, both in TPU and VM