tensorflow / tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Apache License 2.0
15.5k stars 3.49k forks source link

Transformer ASR, not able to import MetaGraph, KeyError: u'InfeedEnqueueTuple' #986

Closed ajithAI closed 6 years ago

ajithAI commented 6 years ago

Description

Hi. I am trying to load Meta Graph for Inference without tf eager execution ( ASR Transformer ) https://tensorflow.github.io/tensor2tensor/tutorials/asr_with_transformer.html I got an Error -> KeyError: u'InfeedEnqueueTuple'

Did I missed any module to import ??

Environment information

TF 1.9. python3

If there is any reference to run inference without Eager Execution, that would be very Helpful.

Error logs:

TraceBack :


KeyError Traceback (most recent call last)

in () 1 sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True)) ----> 2 saver = tf.train.import_meta_graph("checkpoints/transformer_asr_180214/model.ckpt-230000.meta") /usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.pyc in import_meta_graph(meta_graph_or_file, clear_devices, import_scope, **kwargs) 1958 clear_devices=clear_devices, 1959 import_scope=import_scope, -> 1960 **kwargs) 1961 1962 if meta_graph_def.HasField("saver_def"): /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/meta_graph.pyc in import_scoped_meta_graph(meta_graph_or_file, clear_devices, graph, import_scope, input_map, unbound_inputs_col_name, restore_collections_predicate) 742 name=(import_scope or scope_to_prepend_to_names), 743 input_map=input_map, --> 744 producer_op_list=producer_op_list) 745 746 # Restores all the other collections. /usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.pyc in new_func(*args, **kwargs) 430 'in a future version' if date is None else ('after %s' % date), 431 instructions) --> 432 return func(*args, **kwargs) 433 return tf_decorator.make_decorator(func, new_func, 'deprecated', 434 _add_deprecated_arg_notice_to_docstring( /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/importer.py in import_graph_def(graph_def, input_map, return_elements, name, op_dict, producer_op_list) 389 if producer_op_list is not None: 390 # TODO(skyewm): make a copy of graph_def so we're not mutating the argument? --> 391 _RemoveDefaultAttrs(op_dict, producer_op_list, graph_def) 392 #pass 393 /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/importer.py in _RemoveDefaultAttrs(op_dict, producer_op_list, graph_def) 156 # Remove any default attr values that aren't in op_def. 157 if node.op in producer_op_dict: --> 158 op_def = op_dict[node.op] 159 producer_op_def = producer_op_dict[node.op] 160 # We make a copy of node.attr to iterate through since we may modify KeyError: u'InfeedEnqueueTuple' ... ```
sxsxsx commented 6 years ago

I have meet the same problem, have you solved it? @ajithAI

ajithAI commented 6 years ago

Yup .. Include following

from tensor2tensor import models from tensor2tensor import problems from tensor2tensor.layers import common_layers from tensor2tensor.utils import trainer_lib from tensor2tensor.utils import t2t_model from tensor2tensor.utils import registry

some module from the above is needed.

After that I ran into some other ..

I am not able to restore the weights . Error is : InvalidArgumentError: No OpKernel was registered to support Op 'InfeedEnqueueTuple' with these attrs. Registered devices: [CPU], Registered kernels:

stefan-falk commented 6 years ago

@ajithAI This is not working for me ..

I am executing

import os

import tensorflow as tf

from tensor2tensor import models
from tensor2tensor import problems
from tensor2tensor.layers import common_layers
from tensor2tensor.utils import trainer_lib
from tensor2tensor.utils import t2t_model
from tensor2tensor.utils import registry

def main():

    latest_ckpt = tf.train.latest_checkpoint('/data/workspaces/git/speech/asr/shell/model/transformer')

    meta_graph_path = latest_ckpt + '.meta'

    with tf.Session() as sess:
        saver = tf.train.import_meta_graph(meta_graph_path)
        saver.restore(sess, latest_ckpt)

    print('All done.')

if __name__ == '__main__':
    main()

but I get:


InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Cannot assign a device for operation 'transformer/parallel_0_5/transformer/transformer/body/encoder/layer_0/self_attention/multihead_attention/q/Tensordot/ListDiff': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Registered kernels:
  device='CPU'; T in [DT_STRING]; out_idx in [DT_INT64]
  device='CPU'; T in [DT_STRING]; out_idx in [DT_INT32]
  device='CPU'; T in [DT_DOUBLE]; out_idx in [DT_INT64]
  device='CPU'; T in [DT_DOUBLE]; out_idx in [DT_INT32]
  device='CPU'; T in [DT_FLOAT]; out_idx in [DT_INT64]
  device='CPU'; T in [DT_FLOAT]; out_idx in [DT_INT32]
  device='CPU'; T in [DT_BFLOAT16]; out_idx in [DT_INT64]
  device='CPU'; T in [DT_BFLOAT16]; out_idx in [DT_INT32]
  device='CPU'; T in [DT_HALF]; out_idx in [DT_INT64]
  device='CPU'; T in [DT_HALF]; out_idx in [DT_INT32]
  device='CPU'; T in [DT_INT8]; out_idx in [DT_INT64]
  device='CPU'; T in [DT_INT8]; out_idx in [DT_INT32]
  device='CPU'; T in [DT_UINT8]; out_idx in [DT_INT64]
  device='CPU'; T in [DT_UINT8]; out_idx in [DT_INT32]
  device='CPU'; T in [DT_INT16]; out_idx in [DT_INT64]
  device='CPU'; T in [DT_INT16]; out_idx in [DT_INT32]
  device='CPU'; T in [DT_UINT16]; out_idx in [DT_INT64]
  device='CPU'; T in [DT_UINT16]; out_idx in [DT_INT32]
  device='CPU'; T in [DT_INT32]; out_idx in [DT_INT64]
  device='CPU'; T in [DT_INT32]; out_idx in [DT_INT32]
  device='CPU'; T in [DT_INT64]; out_idx in [DT_INT64]
  device='CPU'; T in [DT_INT64]; out_idx in [DT_INT32]

     [[Node: transformer/parallel_0_5/transformer/transformer/body/encoder/layer_0/self_attention/multihead_attention/q/Tensordot/ListDiff = ListDiff[T=DT_INT32, out_idx=DT_INT32, _device="/device:GPU:0"](transformer/parallel_0_5/transformer/transformer/body/encoder/layer_0/self_attention/multihead_attention/q/Tensordot/range, transformer/parallel_0_5/transformer/transformer/body/encoder/layer_0/self_attention/multihead_attention/q/Tensordot/add_1)]]

Any idea how to fix this? ..

ajithAI commented 6 years ago

This is because the Meta is not fitting with index.

So, I loaded model from tensor2tensor as below ..

problem_name = "librispeech_clean" asr_problem = problems.problem(problem_name) encoders = asr_problem.feature_encoders(None)

model_name = "transformer" hparams_set = "transformer_librispeech_tpu"

hparams = trainer_lib.create_hparams(hparams_set,data_dir="./", problem_name=problem_name) asr_model = registry.model(model_name)(hparams, Modes.PREDICT) inf = asr_model.infer(inputs, beam_size=2, alpha=0.6, decode_length=1)["outputs"]

and I loaded weights from the trained model .. saver = tf.train.Saver() saver.restore(sess,tf.train.latest_checkpoint('checkpoints/transformer_asr_180214/'))

now you can run sess.run()

ajithAI commented 6 years ago

@ajithAI This is not working for me ..

I am executing

import os

import tensorflow as tf

from tensor2tensor import models
from tensor2tensor import problems
from tensor2tensor.layers import common_layers
from tensor2tensor.utils import trainer_lib
from tensor2tensor.utils import t2t_model
from tensor2tensor.utils import registry

def main():

    latest_ckpt = tf.train.latest_checkpoint('/data/workspaces/git/speech/asr/shell/model/transformer')

    meta_graph_path = latest_ckpt + '.meta'

    with tf.Session() as sess:
        saver = tf.train.import_meta_graph(meta_graph_path)
        saver.restore(sess, latest_ckpt)

    print('All done.')

if __name__ == '__main__':
    main()

but I get:


InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Cannot assign a device for operation 'transformer/parallel_0_5/transformer/transformer/body/encoder/layer_0/self_attention/multihead_attention/q/Tensordot/ListDiff': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Registered kernels:
  device='CPU'; T in [DT_STRING]; out_idx in [DT_INT64]
  device='CPU'; T in [DT_STRING]; out_idx in [DT_INT32]
  device='CPU'; T in [DT_DOUBLE]; out_idx in [DT_INT64]
  device='CPU'; T in [DT_DOUBLE]; out_idx in [DT_INT32]
  device='CPU'; T in [DT_FLOAT]; out_idx in [DT_INT64]
  device='CPU'; T in [DT_FLOAT]; out_idx in [DT_INT32]
  device='CPU'; T in [DT_BFLOAT16]; out_idx in [DT_INT64]
  device='CPU'; T in [DT_BFLOAT16]; out_idx in [DT_INT32]
  device='CPU'; T in [DT_HALF]; out_idx in [DT_INT64]
  device='CPU'; T in [DT_HALF]; out_idx in [DT_INT32]
  device='CPU'; T in [DT_INT8]; out_idx in [DT_INT64]
  device='CPU'; T in [DT_INT8]; out_idx in [DT_INT32]
  device='CPU'; T in [DT_UINT8]; out_idx in [DT_INT64]
  device='CPU'; T in [DT_UINT8]; out_idx in [DT_INT32]
  device='CPU'; T in [DT_INT16]; out_idx in [DT_INT64]
  device='CPU'; T in [DT_INT16]; out_idx in [DT_INT32]
  device='CPU'; T in [DT_UINT16]; out_idx in [DT_INT64]
  device='CPU'; T in [DT_UINT16]; out_idx in [DT_INT32]
  device='CPU'; T in [DT_INT32]; out_idx in [DT_INT64]
  device='CPU'; T in [DT_INT32]; out_idx in [DT_INT32]
  device='CPU'; T in [DT_INT64]; out_idx in [DT_INT64]
  device='CPU'; T in [DT_INT64]; out_idx in [DT_INT32]

   [[Node: transformer/parallel_0_5/transformer/transformer/body/encoder/layer_0/self_attention/multihead_attention/q/Tensordot/ListDiff = ListDiff[T=DT_INT32, out_idx=DT_INT32, _device="/device:GPU:0"](transformer/parallel_0_5/transformer/transformer/body/encoder/layer_0/self_attention/multihead_attention/q/Tensordot/range, transformer/parallel_0_5/transformer/transformer/body/encoder/layer_0/self_attention/multihead_attention/q/Tensordot/add_1)]]

Any idea how to fix this? ..

Try : sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True))

Punchwes commented 5 years ago

@ajithAI hello, have you solved the no opkernel error? I have also encountered this problem and could not find a solution.