triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.25k stars 1.47k forks source link

Allow the use of different SavedModel signature_def #1352

Closed jeisinge closed 3 years ago

jeisinge commented 4 years ago

Is your feature request related to a problem? Please describe. We are training a multi-head TensorFlow Estimator. The exported SavedModel has multiple signature definitions. Unfortunately, the serving_default signature_def does not contain all of the heads we want to infer; the predict signature_def supports this functionality. Ideally, we would like to be able to specify the specific signatures we want to infer.

Describe the solution you'd like Ability to specify the signature definition on the request or in the config.pbtxt.

Describe alternatives you've considered TF-Serving supports this functionality.

We are looking into rewriting our SavedModel to only have one, serving_default signature_def.

Additional context On TF-Serving, the signature_def is specified in the request.

KFServing does not appear to support this on the request.

https://github.com/NVIDIA/triton-inference-server/issues/94 is similar in that the serving_default signature_def did not exist in that model. https://github.com/NVIDIA/triton-inference-server/pull/344/files appears to attempt to load any signature_def is serving_def is not found in the SavedModel.

jeisinge commented 4 years ago

FYI - we have a workaround to change the default signature_def of the SavedModel:

import tensorflow as tf
from tensorflow.python.saved_model import loader_impl
def set_default_signature_def(saved_model_path, signature_def="predict", updated_saved_model_filename = "saved_model2.pb"):
    default_signature_def = tf.saved_model.DEFAULT_SERVING_SIGNATURE_DEF_KEY
    backup_signature_def = "{}_old".format(default_signature_def)

    saved_model_proto = loader_impl.parse_saved_model(saved_model_path)
    signature_defs = saved_model_proto.meta_graphs[0].signature_def

    def copy_signature_def(src, dest):
        src_signature_def = signature_defs.get(src)
        if None != src_signature_def:
          tf.logging.info(f"Copying/replacing {src} to {dest} singature_def")
          signature_defs.get_or_create(dest).CopyFrom(src_signature_def)
        else:
          tf.logging.warn(f"Unable to copy {src} to {dest} singature_def: {src} singature_def does not exist!")

    copy_signature_def(default_signature_def, backup_signature_def)
    copy_signature_def(signature_def, default_signature_def)

    new_saved_model_filename_path = f"{saved_model_path}/{updated_saved_model_filename}"
    tf.logging.info(f"Writing out updated file to {new_saved_model_filename_path}")
    with open(new_saved_model_filename_path, "wb") as f:
        f.write(saved_model_proto.SerializeToString())

We believe this to be working. However, we are still struggling with inference. We believe we need to replace build_parsing_serving_input_receiver_fn next.

deadeyegoodwin commented 4 years ago

I'm not sure about the receiver_fn, but I believe TF has some simple functions that save your model in the appropriate format for inferencing. Have you tried those?

jeisinge commented 4 years ago

We have an TF Estimator where we export the model. It appears the two common serving_input_receiver_fn's are a parsing TF Example (row-based) one and a raw one (column-based).

The parsing one fails with a message that a tensor cannot be found --- however, the same serialized Example works on TF-Serving.

We also attempted to test the raw serving input receiver. We are still working on serializing Java long[] and double[] correctly to protobuf's ByteString. If you have any tips/API, they are much appreciated!

Also, if you know of a better way to export a TF Estimator to Triton, let us know!

deadeyegoodwin commented 4 years ago

The only experience we have with the Estimator is that it can be hard to work with. Please share your findings.

It sounds like you are using GRPC. We are transitioning to a new set of protocols (HTTP/REST and GRPC) over the next couple of releases. The GRPC protocol allows you to pass input tensors directly as long or double, so it might be easier to work with from Java. See the Roadmap section of the README for details.

jeisinge commented 4 years ago

Yeah - I saw those new interfaces --- they appear to be better defined around what a tensor is. Once, they are released, we will test them!

TF Estimator, for us, has been a steep learning curve. But, when you follow the pattern, there are many niceties and best practices --- exports, serving, evaluations, input processing, separation of concerns, etc. One of those practices is tf.estimator.export.build_parsing_serving_input_receiver_fn. At first glance, it appears to be very inefficient; and, maybe it is. However, on the client-side, the row-based structure aligns better with our application. We are currently experimenting with tf.estimator.export.build_raw_serving_input_receiver_fn to test with Triton and TensorRT.

hpn0t0ad commented 4 years ago

We have a couple of TF SavedModels using SignatureDefs different from 'serving_default' which we have been using with TensorFlow Serving so far. Re-creating the models to change the SignatureDefs does not really work for us, though. Are there any plans to support specifying a specific signature in a ModelConfig?

jeisinge commented 4 years ago

We use the code I have mentioned above to change the saved model signature def after we train. It works pretty well as a post train update.

hpn0t0ad commented 4 years ago

Yeah, thanks for sharing your script! I'm kinda hesitant to go that route since it would mean duplicating model files for each sig, right? Not sure I wanna do that...

jeisinge commented 4 years ago

Good point. That does mean that.

CoderHam commented 3 years ago

@jeisinge I believe support for this was l already added a while ago.