tensorflow / serving

A flexible, high-performance serving system for machine learning models
https://www.tensorflow.org/serving
Apache License 2.0
6.13k stars 2.19k forks source link

Export nmt trained model to tensorflow serving #712

Closed luozhouyang closed 5 years ago

luozhouyang commented 6 years ago

I have trained nmt models, but I can not understand how to export the models to tensorflow serving. I read the documentations of MNIST and Inception, but I think these models are different with nmt models. Can you add a demo to show how to export the nmt models? This would be a great help to beginners like me, thanks!

aforwardz commented 6 years ago

export saved model--this is my answer in SO. Hope that helps. You also need to define your signature

luozhouyang commented 6 years ago

@aforwardz thanks for your reply. Here is my code:

export_path = os.path.join(
            tf.compat.as_bytes(self.export_base_path),
            tf.compat.as_bytes(str(self.version_number))
        )

        # with tf.device('/gpu:0'):
        sess = tf.Session()
        saver = tf.train.import_meta_graph(os.path.join(self.model_dir, "translate.ckpt-21000.meta"))
        latest_ckpt = tf.train.latest_checkpoint(self.model_dir)
        saver.restore(sess, latest_ckpt)

        builder = tf.saved_model.builder.SavedModelBuilder(export_path)

        # I am not sure this way to create PREDICT_INPUTS and PREDICT_OUTPUTS is right or not.
        feature_configs = {
            'x': tf.VarLenFeature(shape=[], dtype=tf.string),
            'y': tf.VarLenFeature(shape=[], dtype=tf.string)
        }
        serialized_example = tf.placeholder(tf.string, name="tf_example")
        tf_example = tf.parse_example(serialized_example, feature_configs)
        x = tf.identity(tf_example['x'], name='x')
        y = tf.identity(tf_example['y'], name='y')
        predict_input = tf.saved_model.utils.build_tensor_info(x)
        predict_output = tf.saved_model.utils.build_tensor_info(y)
        predict_signature_def_map = tf.saved_model.signature_def_utils.predict_signature_def(
            inputs={
                tf.saved_model.signature_constants.PREDICT_INPUTS: predict_input
            },
            outputs={
                tf.saved_model.signature_constants.PREDICT_OUTPUTS: predict_output
            }
        )

        legacy_init_op = tf.group(tf.tables_initializer(), name="legacy_init_op")
        builder.add_meta_graph_and_variables(
            sess=sess,
            tags=[tf.saved_model.tag_constants.SERVING],
            signature_def_map={
                "predict_signature_map": predict_signature_def_map
            },
            legacy_init_op=legacy_init_op,
            assets_collection=None
        )
        builder.save()

But errors occur:

2018-01-05 17:20:31.485773: I C:\tf_jenkins\home\workspace\tf-nightly-windows\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1154] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 950, pci bus id: 0000:01:00.0, compute capability: 5.2)
Traceback (most recent call last):
  File "D:\ProgramFiles\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1323, in _do_call
    return fn(*args)
  File "D:\ProgramFiles\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1293, in _run_fn
    self._extend_graph()
  File "D:\ProgramFiles\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1354, in _extend_graph
    self._session, graph_def.SerializeToString(), status)
  File "D:\ProgramFiles\Python\Python35\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'gradients/dynamic_seq2seq/encoder/embedding_lookup_grad/ToInt32': Could not satisfy explicit device specification '' because the node was colocated with a group of nodes that required incompatible device '/device:GPU:0'
Colocation Debug Info:
Colocation group had the following types and devices: 
Assign: GPU CPU 
Const: GPU CPU 
StridedSlice: GPU CPU 
TensorArrayScatterV3: GPU CPU 
Cast: GPU CPU 
Identity: GPU CPU 
StackV2: GPU CPU 
Sub: GPU CPU 
Enter: GPU CPU 
VariableV2: GPU CPU 
RandomUniform: GPU CPU 
ScatterSub: GPU CPU 
Neg: GPU CPU 
Mul: GPU CPU 
Add: GPU CPU 
L2Loss: CPU 
Size: GPU CPU 
TensorArrayV3: GPU CPU 
ExpandDims: GPU CPU 
Reshape: GPU CPU 
ConcatV2: GPU CPU 
TensorArrayReadV3: GPU CPU 
Gather: GPU CPU 
StackPopV2: GPU CPU 
RealDiv: GPU CPU 
BroadcastGradientArgs: GPU CPU 
FloorMod: GPU CPU 
ShapeN: GPU CPU 
ConcatOffset: GPU CPU 
StackPushV2: GPU CPU 
TensorArrayGradV3: GPU CPU 
TensorArrayGatherV3: GPU CPU 
Shape: GPU CPU 
Floor: GPU CPU 
MatMul: GPU CPU 
Slice: GPU CPU 
Sum: GPU CPU 
TensorArrayWriteV3: GPU CPU 
     [[Node: gradients/dynamic_seq2seq/encoder/embedding_lookup_grad/ToInt32 = Cast[DstT=DT_INT32, SrcT=DT_INT64, _class=["loc:@embeddings/encoder/embedding_encoder"]](gradients/dynamic_seq2seq/encoder/embedding_lookup_grad/Shape)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "E:/PyCharmProjects/GNMT/nmt-demo/export/exporter.py", line 106, in <module>
    exporter.export()
  File "E:/PyCharmProjects/GNMT/nmt-demo/export/exporter.py", line 26, in export
    saver.restore(sess, latest_ckpt)
  File "D:\ProgramFiles\Python\Python35\lib\site-packages\tensorflow\python\training\saver.py", line 1683, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "D:\ProgramFiles\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 889, in run
    run_metadata_ptr)
  File "D:\ProgramFiles\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "D:\ProgramFiles\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1317, in _do_run
    options, run_metadata)
  File "D:\ProgramFiles\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation 'gradients/dynamic_seq2seq/encoder/embedding_lookup_grad/ToInt32': Could not satisfy explicit device specification '' because the node was colocated with a group of nodes that required incompatible device '/device:GPU:0'
Colocation Debug Info:
Colocation group had the following types and devices: 
Assign: GPU CPU 
Const: GPU CPU 
StridedSlice: GPU CPU 
TensorArrayScatterV3: GPU CPU 
Cast: GPU CPU 
Identity: GPU CPU 
StackV2: GPU CPU 
Sub: GPU CPU 
Enter: GPU CPU 
VariableV2: GPU CPU 
RandomUniform: GPU CPU 
ScatterSub: GPU CPU 
Neg: GPU CPU 
Mul: GPU CPU 
Add: GPU CPU 
L2Loss: CPU 
Size: GPU CPU 
TensorArrayV3: GPU CPU 
ExpandDims: GPU CPU 
Reshape: GPU CPU 
ConcatV2: GPU CPU 
TensorArrayReadV3: GPU CPU 
Gather: GPU CPU 
StackPopV2: GPU CPU 
RealDiv: GPU CPU 
BroadcastGradientArgs: GPU CPU 
FloorMod: GPU CPU 
ShapeN: GPU CPU 
ConcatOffset: GPU CPU 
StackPushV2: GPU CPU 
TensorArrayGradV3: GPU CPU 
TensorArrayGatherV3: GPU CPU 
Shape: GPU CPU 
Floor: GPU CPU 
MatMul: GPU CPU 
Slice: GPU CPU 
Sum: GPU CPU 
TensorArrayWriteV3: GPU CPU 
     [[Node: gradients/dynamic_seq2seq/encoder/embedding_lookup_grad/ToInt32 = Cast[DstT=DT_INT32, SrcT=DT_INT64, _class=["loc:@embeddings/encoder/embedding_encoder"]](gradients/dynamic_seq2seq/encoder/embedding_lookup_grad/Shape)]]

Caused by op 'gradients/dynamic_seq2seq/encoder/embedding_lookup_grad/ToInt32', defined at:
  File "E:/PyCharmProjects/GNMT/nmt-demo/export/exporter.py", line 106, in <module>
    exporter.export()
  File "E:/PyCharmProjects/GNMT/nmt-demo/export/exporter.py", line 24, in export
    saver = tf.train.import_meta_graph(os.path.join(self.model_dir, "translate.ckpt-21000.meta"))
  File "D:\ProgramFiles\Python\Python35\lib\site-packages\tensorflow\python\training\saver.py", line 1835, in import_meta_graph
    **kwargs)
  File "D:\ProgramFiles\Python\Python35\lib\site-packages\tensorflow\python\framework\meta_graph.py", line 660, in import_scoped_meta_graph
    producer_op_list=producer_op_list)
  File "D:\ProgramFiles\Python\Python35\lib\site-packages\tensorflow\python\util\deprecation.py", line 316, in new_func
    return func(*args, **kwargs)
  File "D:\ProgramFiles\Python\Python35\lib\site-packages\tensorflow\python\framework\importer.py", line 349, in import_graph_def
    op_def=op_def)
  File "D:\ProgramFiles\Python\Python35\lib\site-packages\tensorflow\python\framework\ops.py", line 3076, in create_op
    op_def=op_def)
  File "D:\ProgramFiles\Python\Python35\lib\site-packages\tensorflow\python\framework\ops.py", line 1561, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Cannot assign a device for operation 'gradients/dynamic_seq2seq/encoder/embedding_lookup_grad/ToInt32': Could not satisfy explicit device specification '' because the node was colocated with a group of nodes that required incompatible device '/device:GPU:0'
Colocation Debug Info:
Colocation group had the following types and devices: 
Assign: GPU CPU 
Const: GPU CPU 
StridedSlice: GPU CPU 
TensorArrayScatterV3: GPU CPU 
Cast: GPU CPU 
Identity: GPU CPU 
StackV2: GPU CPU 
Sub: GPU CPU 
Enter: GPU CPU 
VariableV2: GPU CPU 
RandomUniform: GPU CPU 
ScatterSub: GPU CPU 
Neg: GPU CPU 
Mul: GPU CPU 
Add: GPU CPU 
L2Loss: CPU 
Size: GPU CPU 
TensorArrayV3: GPU CPU 
ExpandDims: GPU CPU 
Reshape: GPU CPU 
ConcatV2: GPU CPU 
TensorArrayReadV3: GPU CPU 
Gather: GPU CPU 
StackPopV2: GPU CPU 
RealDiv: GPU CPU 
BroadcastGradientArgs: GPU CPU 
FloorMod: GPU CPU 
ShapeN: GPU CPU 
ConcatOffset: GPU CPU 
StackPushV2: GPU CPU 
TensorArrayGradV3: GPU CPU 
TensorArrayGatherV3: GPU CPU 
Shape: GPU CPU 
Floor: GPU CPU 
MatMul: GPU CPU 
Slice: GPU CPU 
Sum: GPU CPU 
TensorArrayWriteV3: GPU CPU 
     [[Node: gradients/dynamic_seq2seq/encoder/embedding_lookup_grad/ToInt32 = Cast[DstT=DT_INT32, SrcT=DT_INT64, _class=["loc:@embeddings/encoder/embedding_encoder"]](gradients/dynamic_seq2seq/encoder/embedding_lookup_grad/Shape)]]

Process finished with exit code 1

My trained model is a NMT model that used to correct the address. For example: "土海市 浦东新区 张东路 1387 号" --- model --- “上海市 浦东新区 张东路 1387 号”

I think the way I create the signature_def_map is wrong but I have no idea how to correct it. Do you have any ideas?

luozhouyang commented 6 years ago

I exported the model finally!
Here is my code:

        if not self.model_dir:
            raise ValueError("Please specify a model dir.")
        export_path = os.path.join(
            tf.compat.as_bytes(self.export_base_path),
            tf.compat.as_bytes(str(self.version_number))
        )
        config = tf.ConfigProto(allow_soft_placement=True)
        sess = tf.Session(config=config)
        saver = tf.train.import_meta_graph(os.path.join(self.model_dir, "translate.ckpt-21000.meta"))
        latest_ckpt = tf.train.latest_checkpoint(self.model_dir)
        saver.restore(sess, latest_ckpt)
        builder = tf.saved_model.builder.SavedModelBuilder(export_path)
        feature_configs = {
            'x': tf.FixedLenFeature(shape=[], dtype=tf.string),
            'y': tf.FixedLenFeature(shape=[], dtype=tf.string)
        }
        serialized_example = tf.placeholder(tf.string, name="tf_example")
        tf_example = tf.parse_example(serialized_example, feature_configs)
        x = tf.identity(tf_example['x'], name='x')
        y = tf.identity(tf_example['y'], name='y')
        predict_input = x
        predict_output = y
        predict_signature_def_map = tf.saved_model.signature_def_utils.predict_signature_def(
            inputs={
                tf.saved_model.signature_constants.PREDICT_INPUTS: predict_input
            },
            outputs={
                tf.saved_model.signature_constants.PREDICT_OUTPUTS: predict_output
            }
        )

        legacy_init_op = tf.group(tf.tables_initializer(), name="legacy_init_op")
        builder.add_meta_graph_and_variables(
            sess=sess,
            tags=[tf.saved_model.tag_constants.SERVING],
            signature_def_map={
                "predict_signature_map": predict_signature_def_map
            },
            legacy_init_op=legacy_init_op,
            assets_collection=None
        )
        builder.save()

And here is my export directory(without assets_collection):

|----1
    |----saved_model.pb
    |----variables
        |----variables.data-00000-of-00002
        |----variables.data-00001-of-00002
        |----varaibles.index
woodthom2 commented 6 years ago

Hi @luozhouyang, I tried your code for exporting and it works fine. Thanks for posting it. Do you also have some code for the client side? (Sorry not sure if I should put this in a new issue, however I think it would be very helpful for people reading this issue to see the example code for both the server and client side for NMT in TensorFlow Serving.)

I am trying to make a call to the gRPC server (adapting the TF Serving example code to TensorFlow NMT). This is my code:

x = ["this is the text to translate"]

request = predict_pb2.PredictRequest()
request.model_spec.name = 'myModelName'

tp = tf.contrib.util.make_tensor_proto(x)

request.inputs['inputs'].CopyFrom(tp)
# 30 secs timeout because it takes a long time to initialise
result = stub.Predict(request, 30.0)

I also tried

x = [["this is the text to translate"]]

and adding

request.inputs['tf_example'].CopyFrom(tp)

I get errors telling me the input is in the wrong format

AbortionError: AbortionError(code=StatusCode.INVALID_ARGUMENT, details="input size does not match signature")

or I get this error:

AbortionError: AbortionError(code=StatusCode.INVALID_ARGUMENT, details="You must feed a value for placeholder tensor 'tf_example' with dtype string
     [[Node: tf_example = Placeholder[_output_shapes=[<unknown>], dtype=DT_STRING, shape=<unknown>, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]")

Do you have any idea? thanks!

luozhouyang commented 6 years ago

@woodthom2 I am facing the same problem as you. I exported the model but fails when call serving from the client. I think that the predict_input and predict_output tensor is not correct. From the mnist_saved_model.py we can know that the predict_input tensor should be the input of the neural network and the predict_output tensor should be the output of the network, so my code is obviously wrong. That's the problem. I haven't solved this problem. If you work out, please let me know.

samithaj commented 6 years ago

see https://github.com/tensorflow/tensor2tensor#walkthrough and https://github.com/tensorflow/tensor2tensor/tree/master/tensor2tensor/serving

luozhouyang commented 6 years ago

@samithaj Thanks for your reply. I read the source code and I think it makes things more complex. It involves new concepts like registry and problem. I believe things can be done much easier than that way. Do you have another ideas? Thanks anyway.

bugra commented 6 years ago

@luozhouyang Can you provide the tensorflow version?

TypeError: Expected binary or unicode string, got <tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7f282d830b00>

and it looks like that is on the VarLenFeature for x variable:

TypeError: Failed to convert object of type <class 'tensorflow.python.framework.sparse_tensor.SparseTensor'> to Tensor. Contents: SparseTensor(indices=Tensor("ParseExample_4/ParseExample:0", shape=(?, 2), dtype=int64), values=Tensor("ParseExample_4/ParseExample:2", shape=(?,), dtype=string), dense_shape=Tensor("ParseExample_4/ParseExample:4", shape=(2,), dtype=int64)). Consider casting elements to a supported type.

Also, when you export, did you use GPU or CPU on the model file?

woodthom2 commented 6 years ago

In case anybody is having the same problem, I didn't get NMT to work together with Serving directly. The Serving examples all use simpler models such as Inception which have a clearly defined input and output placeholder, whereas NMT uses the newer Datasets API and it's not clear what the equivalents would be.

However I did use a workaround to get NMT to work on a server as a REST API. I took the example code for NMT, which is reading a text file and rewriting to another text file, and I refactored this to receive input from a REST API and return the response via REST. So there is no use of gRPC but you could adapt the same approach for gRPC.

@luozhouyang Until the problems in TensorFlow Serving this thread are fixed, I would suggest trying my approach and you will get NMT working on a server.

luozhouyang commented 6 years ago

@woodthom2 your solution sounds good. Currently I start multi docker containers that provide inference service and use haproxy as load balancing. It works fine, but not efficient. I am interested in your solution. Can you share your code with me?

luozhouyang commented 6 years ago

@bugra The tensorflow version in my code is 1.4.1 with gpu. You can export the model using CPU if you set another argument to True in builder.add_meta_graph_adn_variables() method. I can not remember the arg name exactly, you can check the docs.

bugra commented 6 years ago

@luozhouyang I wrote my exact problem here: https://github.com/tensorflow/serving/issues/777 Do you mind commenting there? I still have a hard time to understand how to interpret feed_dict to tensorflow_serving.

luozhouyang commented 6 years ago

@aforwardz @woodthom2 @samithaj @bugra @ewilderj GOOD NEWS! I exported the model and it works well with tf serving these days! I have made a pull request to tensorflow/nmt, you can have a look: pull request#344. Or, you can visit my fork of tensorflow/nmt at: tensorflow/nmt

mdasadul commented 6 years ago

@luozhouyang Can you share your client?

luozhouyang commented 6 years ago

@mdasadul Here is my client:

class GNMTClient(Client):

    def __init__(self, model_name="address", host="localhost", port=9000, timeout=10):
        self.model_name = model_name
        self.host = host
        self.port = port
        self.timeout = timeout
        channel = implementations.insecure_channel(self.host, self.port)
        self.stub = prediction_service_pb2.beta_create_PredictionService_stub(channel)

    def request(self, input_seq):
        future = self._translate(input_seq)
        result = self._parse_result(future)
        return (input_seq, result)

    def request_many(self, input_seqs):
        futures = []
        for s in input_seqs:
            future = self._translate(s)
            futures.append(future)
        pairs = []
        for seq, future in zip(input_seqs, futures):
            result = self._parse_result(future)
            pairs.append((seq, result))
        return pairs

    def _parse_result(self, future):
        result = self._parse_translation(future.result())
        words = ""
        for w in list(result):
            words += str(w, encoding="utf8") + " "
        return words

    def _translate(self, seq):
        request = predict_pb2.PredictRequest()
        # model_name should keep the same as tf serving start arg `--model_name`
        request.model_spec.name = self.model_name
        # signature_name should keep the same as your `signature_def_map` 's `key` in `exporter`
        request.model_spec.signature_name = "serving_default"
        # `seq_input` should be the same as the `inference_signature` in the `exporter`
        request.inputs["seq_input"].CopyFrom(tf.make_tensor_proto(seq, dtype=tf.string, shape=[1, ]))
        return self.stub.Predict.future(request, self.timeout)

    @staticmethod
    def _parse_translation(result):
        # `seq_output` should be the same as the `inference_signature` in the `exporter`
        inference_output = tf.make_ndarray(result.outputs["seq_output"])
        return inference_output

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--model_name", required=True, help="model name")
    parser.add_argument("--host", default="localhost", help="model server host")
    parser.add_argument("--port", type=int, default=9000, help="model server port")
    parser.add_argument("--timeout", type=float, default=10.0, help="request timeout")
    args = parser.parse_args()

    test_seqs = [
        "上海 浦东新区 张东路",
        "浙江 杭州 下沙区",
        "北京市 海淀区 北京西路"
    ]
    client = GNMTClient(model_name=args.model_name, host=args.host, port=args.port, timeout=args.timeout)

    input_seq, output_seq = client.request(test_seqs[0])
    print("Input : %s" % input_seq)
    print("Output: %s" % output_seq)

    results = client.request_many(test_seqs)
    for r in results:
        print("Input : %s" % r[0])
        print("Output: %s" % r[1])

For run the client, you need to install the dependencies:

ptamas88 commented 6 years ago

@luozhouyang here is my client, and it runs, but i get the same result every time no matter what the inputs are. can you help me? also i tried your code but i got an error in the first line: class GNMTClient(Client) because Client is not found

# -*- coding : utf-8 -*- 

from __future__ import print_function

import argparse
from nltk import word_tokenize
import time
import json
import os
import tensorflow as tf

from grpc.beta import implementations
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2

PYTHONIOENCODING="UTF-8"

def parse_translation_result(args,result):

  hypotheses = tf.make_ndarray(result.outputs["seq_output"])
  str1 = ' '.join(str(e) for e in hypotheses if e != "</s>")  
  return str1

def translate(stub, model_name, tokens, timeout=5.0):

  request = predict_pb2.PredictRequest()
  request.model_spec.name = model_name
  request.model_spec.signature_name = tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY
  request.inputs["seq_input"].CopyFrom(tf.contrib.util.make_tensor_proto(tokens))
  xy = stub.Predict.future(request, timeout)
  return xy

def main():
  json_data = {}
  start = time.time()
  json_data['start'] = str(time.ctime(int(start)))

  parser = argparse.ArgumentParser(description="Translation client")
  parser.add_argument("--model_name", required=True,
                      help="model name (name of the file?)")
  parser.add_argument("--host", default="localhost",
                      help="model server host")
  parser.add_argument("--port", type=int, default=9000,
                      help="model server port")
  parser.add_argument("--timeout", type=float, default=10.0,
                      help="request timeout")
  parser.add_argument("--text", default="",
                      help="Untokenized input text")

  args = parser.parse_args()
  channel = implementations.insecure_channel(args.host, args.port)
  stub = prediction_service_pb2.beta_create_PredictionService_stub(channel)

  tokens_list = []
  if args.text != "":
      json_data['input'] = args.text
      text = args.text
      tokens = text.split()
      tokens_list.append(tokens)

  token_count = 0
  for tokens in tokens_list:
    trans = translate(stub, args.model_name, tokens, timeout=args.timeout)
    result = trans.result()
    best_result = parse_translation_result(args,result)
    json_data['result_'+str(token_count)] = best_result

  end = time.time()
  json_data['duration'] = str(round(end-start,3))+" sec"
  json_data['end'] = str(time.ctime(int(end)))
  json_result = json.dumps(json_data, sort_keys=True)
  print (json_result)

if __name__ == "__main__":
  main()
luozhouyang commented 6 years ago

@ptamas88 I think that same inference output in different inputs is not related to the exporting but to your pre-trained model. The Client is just an abstract class:

class Client:
    def request(self, input_seq):
        raise NotImplementedError()

    def request_many(self, input_seqs)
        raise NotImplementedError()
ptamas88 commented 6 years ago

@luozhouyang i have tried the model with the nmt inference command and it works well. Strange thing with serving is that I always get the first lines translation in the output. It seems that the input is not loaded into the predicition protobuf, and this way the translation is always the first line.

With the Client abstract i successfully ran your client also, but i get the same result: this is an english-hungarian model, and the output sentence is always the first line in the hungarian corpus. The strange appearance is because i use pretokenized corpus but it doesnt affect the results. (I expect not perfect but different results.)

Input : Hello world
Output: ▁ # ▁Még ▁soha ▁nem ▁álmodtam . </s> . </s> . </s> . </s> , ▁kérlek .
Input : Hello world
Output: ▁ # ▁Még ▁soha ▁nem ▁álmodtam . </s> . </s> . </s> . </s> , ▁kérlek .
Input : What's up?
Output: ▁ # ▁Még ▁soha ▁nem ▁álmodtam . </s> . </s> . </s> . </s> , ▁kérlek .
Input : My name is Bond
Output: ▁ # ▁Még ▁soha ▁nem ▁álmodtam . </s> . </s> . </s> . </s> , ▁kérlek .

When you run your client does it give back different (and acceptable) result?

FYI: I use Tensorflow 1.6 for training and tensorflow-serving-api 1.5 for serving on a different machine.

luozhouyang commented 6 years ago

@ptamas88 I do get same results sometimes, but also get different results in my test. But the model used in my test is just a very simple model that trained only a few steps. I'll do more test and check the code again. Thanks for pointing out the problem!

ptamas88 commented 6 years ago

@luozhouyang I think something is missing around the input placeholder during the exporting. But i dont have deep knowledge in this area :( Please let me know if you reach some development :) Thank you very much

luozhouyang commented 6 years ago

@ptamas88 I think I found the reason. It's all due to the --infer_file argument. Serving will take this file as input of the inference and return results of that input. I am working on it.

msk86 commented 6 years ago

@luozhouyang I'm using your code(https://github.com/luozhouyang/nmt/commit/dfab5a285165e5e297b96605f04a41512e0daf3b) to export model, the model can be served but it always returns the same result.

I guess it related to the --infer_file argument. Any update on it?

Thanks.

nguyenvulebinh commented 5 years ago

@msk86 I have same problem too! Any update @luozhouyang ?

gautamvasudevan commented 5 years ago

Closing as we are moving help and support to Stack Overflow:

https://stackoverflow.com/questions/tagged/tensorflow-serving

If you open a GitHub issue, it must be a bug, a feature request, or a significant problem with documentation (for small docs fixes please send a PR instead).

Thanks!

bidai541 commented 5 years ago

Holp this can help you. @luozhouyang @nguyenvulebinh @msk86 First: change exporter.py

def export(self):
    infer_model = self._create_infer_model()
    with tf.Session(graph=infer_model.graph,
                    config=tf.ConfigProto(allow_soft_placement=True)) as sess:
      feature_config = {
        'input': tf.FixedLenSequenceFeature(dtype=tf.string,
                                            shape=[], allow_missing=True),
      }
      serialized_example = tf.placeholder(dtype=tf.string, name="serialized_example")
      tf_example = tf.parse_example(serialized_example, feature_config)
      inference_input = tf.identity(tf_example['input'], name="infer_input")

      saver = infer_model.model.saver
      saver.restore(sess, self._ckpt_path)
      sess.run(tf.tables_initializer())
      # note here. Do not use decode func of model.
      inference_outputs = infer_model.model.sample_words

      inference_signature = tf.saved_model.signature_def_utils.predict_signature_def(
        inputs={
          'seq_input': inference_input
        },
        outputs={
          'seq_output': tf.convert_to_tensor(inference_outputs)
        }
      )
      legacy_ini_op = tf.group(tf.tables_initializer(), name='legacy_init_op')

      builder = tf.saved_model.builder.SavedModelBuilder(self._export_dir)

change model_helper.py Do not use dataset API

def pre_process(src_string, src_vocab_table, eos, src_max_len=35):
  src_eos_id = tf.cast(src_vocab_table.lookup(tf.constant(eos)), tf.int32)
  src_string = tf.string_split([src_string]).values

  if src_max_len:
    src_string = src_string[:, src_max_len]
  # Convert the word strings to ids
  src = tf.cast(src_vocab_table.lookup(src_string), tf.int32)
  # Add in the word counts.
  src = tf.expand_dims(src, axis=0)
  src_len = tf.size(src)
  src_len = tf.expand_dims(src_len, axis=0)
  return BatchedInput(
    initializer=None,
    source=src,
    target_input=None,
    target_output=None,
    source_sequence_length=src_len,
    target_sequence_length=None)

def create_infer_model(model_creator, hparams, scope=None, extra_args=None):
  """Create inference model."""
  graph = tf.Graph()
  src_vocab_file = hparams.src_vocab_file
  tgt_vocab_file = hparams.tgt_vocab_file

  with graph.as_default(), tf.container(scope or "infer"):
    src_vocab_table, tgt_vocab_table = vocab_utils.create_vocab_tables(
        src_vocab_file, tgt_vocab_file, hparams.share_vocab)
    reverse_tgt_vocab_table = lookup_ops.index_to_string_table_from_file(
        tgt_vocab_file, default_value=vocab_utils.UNK)

    src_placeholder = tf.placeholder(shape=[None], dtype=tf.string)
    batch_size_placeholder = tf.constant(1, tf.int64)

    iterator = pre_process(
      src_placeholder,
        src_vocab_table,
        eos=hparams.eos,
        src_max_len=hparams.src_max_len_infer)
    model = model_creator(
        hparams,
        iterator=iterator,
        mode=tf.contrib.learn.ModeKeys.INFER,
        source_vocab_table=src_vocab_table,
        target_vocab_table=tgt_vocab_table,
        reverse_target_vocab_table=reverse_tgt_vocab_table,
        scope=scope,
        extra_args=extra_args)
  return InferModel(
      graph=graph,
      model=model,
      src_placeholder=src_placeholder,
      batch_size_placeholder=batch_size_placeholder,
      iterator=iterator)

This solution works in my case(I'am not using tensorflow-serving). Inference_signature should also need to change while using serving module and it's not hard.

harshS26 commented 5 years ago

@bidai541 how did you deploy your model ? can you please share your code.

bidai541 commented 5 years ago

@harshS26 The main modification is "model_helper.py". I do not use tensorflow-serving, in my case, I exported the model to do inference in spark cluster.

harshS26 commented 5 years ago

@bidai541 @luozhouyang i did modify the model_helper.py and exported the model using https://github.com/tensorflow/nmt/pull/344 but while making the grpc request i am getting the following error

grpc.framework.interfaces.face.face.AbortionError: AbortionError(code=StatusCode.INVALID_ARGUMENT, details="You must feed a value for placeholder tensor 'src_placeholder' with dtype string [[{{node src_placeholder}} = Placeholder[_output_shapes=[], dtype=DT_STRING, shape=, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]")

bidai541 commented 5 years ago

@harshS26 I think you should move out the definition of placeholder src_placeholder = tf.placeholder(shape=[None], dtype=tf.string) to export.py then create the graph using this placeholder. Add it to the params of model_helper.create_infer_model().

joshi-bharat commented 5 years ago

@bidai541 I am also getting the same issue as @harshS26

harshS26 commented 5 years ago

@bidai541 I made changes in exporter.py and the error got resolved. But now i get only one word as output for every input sequence, which i think is because of this line: inference_outputs = infer_model.model.sample_words

harshS26 commented 5 years ago

@bharat-robotics try this.. exporter.py

def export(self):
    infer_model = self._create_infer_model()

    with tf.Session(graph=infer_model.graph,
                    config=tf.ConfigProto(allow_soft_placement=True)) as sess:
      feature_config = {
        'input': tf.FixedLenSequenceFeature(dtype=tf.string,
                                            shape=[], allow_missing=True),
      }
      inference_input = infer_model.graph.get_tensor_by_name('src_placeholder:0')     
      saver = infer_model.model.saver
      saver.restore(sess, self._ckpt_path)

      sess.run(tf.tables_initializer())
      inference_outputs = infer_model.model.sample_words
      inference_output = inference_outputs[0]
      inference_signature = tf.saved_model.signature_def_utils.predict_signature_def(
        inputs={
          'seq_input': inference_input
        },
        outputs={
          'seq_output': tf.convert_to_tensor(inference_output)
        }
      )

model_helper.py

src_placeholder = tf.placeholder(dtype=tf.string,name="src_placeholder")
batch_size_placeholder = tf.constant(1, tf.int64)

iterator = pre_process(
      src_placeholder,
        src_vocab_table,
        eos=hparams.eos,
        src_max_len=hparams.src_max_len_infer)
bidai541 commented 5 years ago

@harshS26 You are right. This is my inference function in spark cluster. The output is "decoder_output_bm".

def predict(iterator):
  """
  For each partition, load the pb model and feed in each item of the RDD to the placehold
  """
  from tensorflow.contrib.seq2seq.python.ops import beam_search_ops   # note: do not remove this line
  result = []
  sess = tf.Session()
  graph = tf.get_default_graph()
  tf.saved_model.loader.load(sess, [tf.saved_model.tag_constants.SERVING], model_path)
  input_feature = graph.get_tensor_by_name("src_placeholder:0")
  output = graph.get_tensor_by_name("decoder_output_bm:0")
  for item in iterator:
    one_query_result = []
    predict_result = sess.run([output], feed_dict={input_feature: [item[1]]})
    predict_result = predict_result[0]
    predict_result = np.transpose(predict_result, [2, 1, 0])
    predict_result = np.squeeze(predict_result, axis=1)
    for idx in range(args.beam_size):
      one_query_result.append(" ".join(list(predict_result[idx, :])))
    result.append((item[0] + "\t" + item[1] + "\t" + item[2], one_query_result))
  return iter(result)
harshS26 commented 5 years ago

@harshS26 You are right. This is my inference function in spark cluster. The output is "decoder_output_bm".

def predict(iterator):
  """
  For each partition, load the pb model and feed in each item of the RDD to the placehold
  """
  from tensorflow.contrib.seq2seq.python.ops import beam_search_ops   # note: do not remove this line
  result = []
  sess = tf.Session()
  graph = tf.get_default_graph()
  tf.saved_model.loader.load(sess, [tf.saved_model.tag_constants.SERVING], model_path)
  input_feature = graph.get_tensor_by_name("src_placeholder:0")
  output = graph.get_tensor_by_name("decoder_output_bm:0")
  for item in iterator:
    one_query_result = []
    predict_result = sess.run([output], feed_dict={input_feature: [item[1]]})
    predict_result = predict_result[0]
    predict_result = np.transpose(predict_result, [2, 1, 0])
    predict_result = np.squeeze(predict_result, axis=1)
    for idx in range(args.beam_size):
      one_query_result.append(" ".join(list(predict_result[idx, :])))
    result.append((item[0] + "\t" + item[1] + "\t" + item[2], one_query_result))
  return iter(result)

I can try using your approach but I have few questions on this: 1.predict_result = sess.run([output], feed_dict={input_feature: [item[1]]}) what is item[1] in this? 2.Which variable do i assign the name decoder_output_bm ?

bidai541 commented 5 years ago

@harshS26 You are right. This is my inference function in spark cluster. The output is "decoder_output_bm".

def predict(iterator):
  """
  For each partition, load the pb model and feed in each item of the RDD to the placehold
  """
  from tensorflow.contrib.seq2seq.python.ops import beam_search_ops   # note: do not remove this line
  result = []
  sess = tf.Session()
  graph = tf.get_default_graph()
  tf.saved_model.loader.load(sess, [tf.saved_model.tag_constants.SERVING], model_path)
  input_feature = graph.get_tensor_by_name("src_placeholder:0")
  output = graph.get_tensor_by_name("decoder_output_bm:0")
  for item in iterator:
    one_query_result = []
    predict_result = sess.run([output], feed_dict={input_feature: [item[1]]})
    predict_result = predict_result[0]
    predict_result = np.transpose(predict_result, [2, 1, 0])
    predict_result = np.squeeze(predict_result, axis=1)
    for idx in range(args.beam_size):
      one_query_result.append(" ".join(list(predict_result[idx, :])))
    result.append((item[0] + "\t" + item[1] + "\t" + item[2], one_query_result))
  return iter(result)

I can try using your approach but I have few questions on this: 1.predict_result = sess.run([output], feed_dict={input_feature: [item[1]]}) what is item[1] in this? 2.Which variable do i assign the name decoder_output_bm ?

Sorry, my mistake.
Anthor modification in model.py of line 121

    elif self.mode == tf.contrib.learn.ModeKeys.INFER:
      self.infer_logits, _, self.final_context_state, self.sample_id = res
      self.sample_words = reverse_target_vocab_table.lookup(
          tf.to_int64(self.sample_id), name="decoder_output_bm")

This is same node with "inference_outputs = infer_model.model.sample_words". Maybe something else went wrong, a suggestion, you can print the source string before feeding into graph.

harshS26 commented 5 years ago

@bidai541 this is the error i get while exporting

KeyError: "The name 'decoder_output_bm:0' refers to a Tensor which does not exist

harshS26 commented 5 years ago

@bidai541 Thanks...i get proper prediction using your code.

B1gMinnow commented 5 years ago

@harshS26 can u share your detailed code?

B1gMinnow commented 5 years ago

@bharat-robotics try this.. exporter.py

def export(self):
    infer_model = self._create_infer_model()

    with tf.Session(graph=infer_model.graph,
                    config=tf.ConfigProto(allow_soft_placement=True)) as sess:
      feature_config = {
        'input': tf.FixedLenSequenceFeature(dtype=tf.string,
                                            shape=[], allow_missing=True),
      }
      inference_input = infer_model.graph.get_tensor_by_name('src_placeholder:0')   
      saver = infer_model.model.saver
      saver.restore(sess, self._ckpt_path)

      sess.run(tf.tables_initializer())
      inference_outputs = infer_model.model.sample_words
      inference_output = inference_outputs[0]
      inference_signature = tf.saved_model.signature_def_utils.predict_signature_def(
        inputs={
          'seq_input': inference_input
        },
        outputs={
          'seq_output': tf.convert_to_tensor(inference_output)
        }
      )

model_helper.py

src_placeholder = tf.placeholder(dtype=tf.string,name="src_placeholder")
batch_size_placeholder = tf.constant(1, tf.int64)

iterator = pre_process(
      src_placeholder,
        src_vocab_table,
        eos=hparams.eos,
        src_max_len=hparams.src_max_len_infer)

use your code ,i meet error: grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with: status = StatusCode.INVALID_ARGUMENT details = "input must be a vector, got shape: [1,1] [[{{node StringSplit}}]]" debug_error_string = "{"created":"@1550828332.451979981","description":"Error received from peer","file":"src/core/lib/surface/call.cc","file_line":1095,"grpc_message":"input must be a vector, got shape: [1,1]\n\t [[{{node StringSplit}}]]","grpc_status":3}"

B1gMinnow commented 5 years ago

@bidai541 I made changes in exporter.py and the error got resolved. But now i get only one word as output for every input sequence, which i think is because of this line: inference_outputs = infer_model.model.sample_words

i get the same issue, just one word for every input

B1gMinnow commented 5 years ago

@harshS26 You are right. This is my inference function in spark cluster. The output is "decoder_output_bm".

def predict(iterator):
  """
  For each partition, load the pb model and feed in each item of the RDD to the placehold
  """
  from tensorflow.contrib.seq2seq.python.ops import beam_search_ops   # note: do not remove this line
  result = []
  sess = tf.Session()
  graph = tf.get_default_graph()
  tf.saved_model.loader.load(sess, [tf.saved_model.tag_constants.SERVING], model_path)
  input_feature = graph.get_tensor_by_name("src_placeholder:0")
  output = graph.get_tensor_by_name("decoder_output_bm:0")
  for item in iterator:
    one_query_result = []
    predict_result = sess.run([output], feed_dict={input_feature: [item[1]]})
    predict_result = predict_result[0]
    predict_result = np.transpose(predict_result, [2, 1, 0])
    predict_result = np.squeeze(predict_result, axis=1)
    for idx in range(args.beam_size):
      one_query_result.append(" ".join(list(predict_result[idx, :])))
    result.append((item[0] + "\t" + item[1] + "\t" + item[2], one_query_result))
  return iter(result)

I can try using your approach but I have few questions on this: 1.predict_result = sess.run([output], feed_dict={input_feature: [item[1]]}) what is item[1] in this? 2.Which variable do i assign the name decoder_output_bm ?

Sorry, my mistake. Anthor modification in model.py of line 121

    elif self.mode == tf.contrib.learn.ModeKeys.INFER:
      self.infer_logits, _, self.final_context_state, self.sample_id = res
      self.sample_words = reverse_target_vocab_table.lookup(
          tf.to_int64(self.sample_id), name="decoder_output_bm")

This is same node with "inference_outputs = infer_model.model.sample_words". Maybe something else went wrong, a suggestion, you can print the source string before feeding into graph.

what is the iterator?

baishalichaudhury commented 5 years ago

@bidai541 Hi with the modifications in model_helper.py i get following error: Traceback (most recent call last): File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.5/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/baishali/MLPerf/inference/cloud/translation/gnmt/tensorflow/nmt/nmt.py", line 735, in tf.app.run(main=main, argv=[sys.argv[0]] + unparsed) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "/home/baishali/MLPerf/inference/cloud/translation/gnmt/tensorflow/nmt/nmt.py", line 727, in main run_main(FLAGS, default_hparams, train_fn, inference_fn) File "/home/baishali/MLPerf/inference/cloud/translation/gnmt/tensorflow/nmt/nmt.py", line 698, in run_main trans_file, hparams, num_workers, jobid) File "/home/baishali/MLPerf/inference/cloud/translation/gnmt/tensorflow/nmt/inference.py", line 134, in inference hparams) File "/home/baishali/MLPerf/inference/cloud/translation/gnmt/tensorflow/nmt/inference.py", line 167, in single_worker_inference infer_model.batch_size_placeholder: hparams.infer_batch_size File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 900, in run run_metadata_ptr) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1120, in _run self._graph, fetches, feed_dict_tensor, feed_handles=feed_handles) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 427, in init self._fetch_mapper = _FetchMapper.for_fetch(fetches) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 242, in for_fetch type(fetch))) TypeError: Fetch argument None has invalid type <class 'NoneType'>

So the exact place that throws the error is from inference.py with infer_model.graph.as_default(): sess.run( infer_model.iterator.initializer, feed_dict={ infer_model.src_placeholder: infer_data, infer_model.batch_size_placeholder: hparams.infer_batch_size })

I am guessing this is because of this line iterator = pre_process(src_placeholder,src_vocab_table,eos=hparams.eos,src_max_len=hparams.src_max_len_infer) Which makes infer_model.iterator.initializer =None

baishalichaudhury commented 5 years ago

@harshS26

How did you get the proper prediction, can you please paste the entire code?

harshS26 commented 5 years ago

Hi @baishalichaudhury @B1gMinnow , i have attached my code here..you guys can check nmt.zip

baishalichaudhury commented 5 years ago

Hi

Thanks for your reply. Do you have a readme to briefly explain some of the changes you had to make in the code. Also out of curiosity, did you try restroring the saved .pb file and making an test translation or inference without a client based approach. So for example I want to make a simple test inference code, which will load the saved .pb model, restore placeholders, read an input file with few sentences, (preoprocess the strings maybe?) and translate . My confusions in this approach is how to restore the iterator and initialize before inference. Do you have any experience with this?

Thanks for all your help.

Regards

On Tue, Mar 26, 2019 at 11:42 PM Harshvardhan Singh < notifications@github.com> wrote:

Hi @baishalichaudhury https://github.com/baishalichaudhury @B1gMinnow https://github.com/B1gMinnow , i have attached my code here..you guys can check nmt.zip https://github.com/tensorflow/serving/files/3011842/nmt.zip

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/serving/issues/712#issuecomment-477000900, or mute the thread https://github.com/notifications/unsubscribe-auth/AL_uU50ChuO6BsZMjkIv42s6csnyt9m_ks5vaxLJgaJpZM4RRa9b .

baishalichaudhury commented 5 years ago

Hi

Another thing, when I try to export the model with your code I get this error

File "/home/baishali/MLPerf/inference/cloud/translation/gnmt/tensorflow/nmt/inference.py", line 130, in inference hparams) File "/home/baishali/MLPerf/inference/cloud/translation/gnmt/tensorflow/nmt/inference.py", line 161, in single_worker_inference infer_model.batch_size_placeholder: hparams.infer_batch_size File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 900, in run run_metadata_ptr) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1120, in _run self._graph, fetches, feed_dict_tensor, feed_handles=feed_handles) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 427, in init self._fetch_mapper = _FetchMapper.for_fetch(fetches) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 242, in for_fetch type(fetch))) TypeError: Fetch argument None has invalid type <class 'NoneType'>

This is because the iterator is none, so how did you solve this?

thanks

On Wed, Mar 27, 2019 at 6:05 AM Baishali Chaudhury < baishali.chaudhury@gmail.com> wrote:

Hi

Thanks for your reply. Do you have a readme to briefly explain some of the changes you had to make in the code. Also out of curiosity, did you try restroring the saved .pb file and making an test translation or inference without a client based approach. So for example I want to make a simple test inference code, which will load the saved .pb model, restore placeholders, read an input file with few sentences, (preoprocess the strings maybe?) and translate . My confusions in this approach is how to restore the iterator and initialize before inference. Do you have any experience with this?

Thanks for all your help.

Regards

On Tue, Mar 26, 2019 at 11:42 PM Harshvardhan Singh < notifications@github.com> wrote:

Hi @baishalichaudhury https://github.com/baishalichaudhury @B1gMinnow https://github.com/B1gMinnow , i have attached my code here..you guys can check nmt.zip https://github.com/tensorflow/serving/files/3011842/nmt.zip

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/serving/issues/712#issuecomment-477000900, or mute the thread https://github.com/notifications/unsubscribe-auth/AL_uU50ChuO6BsZMjkIv42s6csnyt9m_ks5vaxLJgaJpZM4RRa9b .

harshS26 commented 5 years ago

@baishalichaudhury I have mailed u the steps taken to export the model

dharm033075 commented 5 years ago

@harshS26 I used your code for exporting the model and successfully exported but while using client I am getting only one word Output. For example: "input": "when will you pay", "output": "when". Could you guide me what changes should I do ?

harshS26 commented 4 years ago

hey @dharm033075 sorry for late reply, did you check my export.py file is it the same? also did u change model_helper.py, model.py & nmt.py?