Loading a saved Lingvo model from its .meta file

smdshakeelhassan commented 4 years ago

Hi. I am trying to load a tensorflow meta graph from a saved Lingvo checkpoint using Tensorflow version 1.15 to convert it to a SavedModel for tensorflow serving.. I am using the following code.

import tensorflow as tf
from tensorflow.python.saved_model import signature_constants
from tensorflow.python.saved_model import tag_constants
import sys

if len(sys.argv)!=2:
        print("Usage:" + sys.argv[0] + "save_dir")
        exit(1)
export_dir=sys.argv[1]
builder = tf.compat.v1.saved_model.builder.SavedModelBuilder(export_dir)
sigs={}
with tf.Session(graph=tf.Graph()) as sess:
        new_saver=tf.train.import_meta_graph("./serv_test/ckpt-00020147.meta")
        new_saver.restore(sess, tf.train.latest_checkpoint("./serv_test"))
        graph=tf.get_default_graph()
        input_audio=graph.get_tensor_by_name('inference/default/wav:0')
        output_hyps=graph.get_tensor_by_name('inference/default/Reshape_7:0')
        sigs[signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY] = tf.saved_model.signature_def_utils.predict_signature_def({"in":input_audio},{"out":output_hyps})
        builder.add_meta_graph_and_variables(sess, [tag_constants.SERVING], signature_def_map=sigs,)
builder.save()

But I am getting the following error in the import_meta_graph line:

Traceback (most recent call last):
  File "xport.py", line 17, in <module>
    saver=tf.train.import_meta_graph("./serv_test/ckpt-00020147.meta")
  File "/home/ubuntu/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 1453, in import_meta_graph
    **kwargs)[0]
  File "/home/ubuntu/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/training/saver.py", line 1477, in _import_meta_graph_with_return_elements
    **kwargs))
  File "/home/ubuntu/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/framework/meta_graph.py", line 809, in import_scoped_meta_graph_with_return_elements
    return_elements=return_elements)
  File "/home/ubuntu/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/ubuntu/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 405, in import_graph_def
    producer_op_list=producer_op_list)
  File "/home/ubuntu/tf1.15/lib/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 501, in _import_graph_def_internal
    graph._c_graph, serialized, options)  # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.NotFoundError: Op type not registered 
'GenericInput' in binary running on ip-10-1-21-241. Make sure the Op and Kernel 
are registered in the binary running in this process. Note that if you are loading a 
saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` 
should be done before importing the graph, as contrib ops are lazily registered when
the module is first accessed.

Is there any way to get around this error? Is it because of the custom built layers used in Lingvo? Is there any tutorial to make a Lingvo Model tensorflow servable? Thanks.

jonathanasdf commented 4 years ago

It is not possible to use lingvo custom ops with standard serving setups, but fortunately most of the custom ops eg. GenericInput are part of the input pipeline for training and are not necessary for serving.

Instead of importing the training graph, you should export an inference graph for your model instead, using https://github.com/tensorflow/lingvo/blob/master/lingvo/core/inference_graph_exporter.py

You can then manually convert the inference graph into a SavedModel or just use tf.import_graph_def() directly.

smdshakeelhassan commented 4 years ago

Hey. Thanks for replying. I am using the following code to get the inference graph.

import tensorflow as tf
from lingvo import model_imports
from lingvo import model_registry
from lingvo.core import inference_graph_exporter

checkpoint = tf.train.latest_checkpoint('/tmp/ebs/lingvo/librispeech/serv_test')
print('Using checkpoint %s' % checkpoint)

params = model_registry.GetParams('asr.librispeech.Librispeech960Wpm', 'Test')
inference_graph = inference_graph_exporter.InferenceGraphExporter.Export(params,freeze_checkpoint=checkpoint, export_path="/tmp/saved_inference_graph")

But I am getting the error:

Traceback (most recent call last):
  File "/tmp/lingvo/core/ops/__init__.py", line 27, in <module>
    from lingvo.core.ops import gen_x_ops
ImportError: cannot import name 'gen_x_ops'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/d42b9c57d24cf5db3bd8d332dc35437f/execroot/__main__/bazel-out/k8-opt/bin/inferencer.runfiles/__main__/inferencer.py", line 9, in <module>
    from lingvo import model_registry
  File "/tmp/lingvo/model_registry.py", line 29, in <module>
    from lingvo.core import base_model_params
  File "/tmp/lingvo/core/base_model_params.py", line 21, in <module>
    from lingvo.core import base_input_generator
  File "/tmp/lingvo/core/base_input_generator.py", line 23, in <module>
    from lingvo.core import base_layer
  File "/tmp/lingvo/core/base_layer.py", line 25, in <module>
    from lingvo.core import cluster_factory
  File "/tmp/lingvo/core/cluster_factory.py", line 21, in <module>
    from lingvo.core import cluster
  File "/tmp/lingvo/core/cluster.py", line 26, in <module>
    from lingvo.core import py_utils
  File "/tmp/lingvo/core/py_utils.py", line 35, in <module>
    from lingvo.core import ops
  File "/tmp/lingvo/core/ops/__init__.py", line 31, in <module>
    tf.resource_loader.get_path_to_datafile('x_ops.so'))
  File "/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/framework/load_library.py", line 61, in load_op_library
    lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /tmp/lingvo/core/ops/x_ops.so: cannot open shared object file: No such file or directory

Can you help me out with this error?

smdshakeelhassan commented 4 years ago

I am using the docker container generated using the command

docker run --rm $(test "$LINGVO_DEVICE" = "gpu" && echo "--runtime=nvidia") -it -v ${LINGVO_DIR}:/tmp -v ${HOME}/.gitconfig:/home/${USER}/.gitconfig:ro -p 6006:6006 -p 8888:8888 --name lingvo_asr tensorflow:lingvo bash

on ubuntu 18.04. In #75 it was mentioned:

It looks like the problem is with the .so being compiled for ubuntu but trying to be loaded into osx. Did you start the colab kernel from inside of docker?

Originally posted by @jonathanasdf in https://github.com/tensorflow/lingvo/issues/75#issuecomment-488098097

But I am using Ubuntu only and not Mac osx.

jonathanasdf commented 4 years ago

Lingvo includes both python and c++ code. x_ops.so is c++ code compiled into a shared library for python to use. When you clone the github you are getting the uncompiled source code, so in order to run you need to compile it and run using bazel, rather than just calling "python some_script.py".

If you just want to use the library and don't expect to make any changes to the core framework, you can consider installing the pip package instead, which has all the c++ parts precompiled.

If you want to continue using the sources, you should add an entry to the BUILD rule for your new file and build/run using bazel.

tensorflow / lingvo

Loading a saved Lingvo model from its .meta file #193