triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.25k stars 1.47k forks source link

[Feature Request] Proper documentation on usage of "label_filename" and code example for server side label look up #3637

Closed BorisPolonsky closed 2 years ago

BorisPolonsky commented 2 years ago

Is your feature request related to a problem? Please describe. I tried eliminating label look up code on the client side via label_filename in config.pbtxt. Yet I can't find this in official document. I tried referring to issue #3467, yet the content in corresponding output tensors (by setting output.parameters['classification'].int64_param = FLAGS.classes as suggested) contains nothing defined in the label_file. For instance, given file config.pbtxt

platform: "tensorflow_savedmodel"
backend: "tensorflow"
max_batch_size: 2
input: [
  {
    name: "input_ids"
    data_type: TYPE_INT32
    dims: [ -1 ]
    allow_ragged_batch: false
  }
]
output [
  {
    name: "logits"
    data_type: TYPE_FP32
    dims: [ 8 ]
    label_filename: "label"
  },
  {
    name: "probs"
    data_type: TYPE_FP32
    dims: [ 8 ]
  },
  {
    name: "labels"
    data_type: TYPE_STRING
    dims: [ 8 ]
  },
  {
    name: "scores"
    data_type: TYPE_FP32
    dims: [ 8 ]
  }
]
parameters: {
    key: "TF_SIGNATURE_DEF"
    value: {
    string_value: "serving_default"
  }
}
batch_input []
batch_output []
instance_group: [
  {
    kind: KIND_MODEL
  }
]

file label

other
label-a
label-b
label-c
label-d
label-e
label-f
label-g

and the following script,

import argparse

import grpc
import tritonclient.grpc
from tritonclient.grpc import service_pb2
from tritonclient.grpc import service_pb2_grpc
from tritonclient.utils import deserialize_bytes_tensor

FLAGS = None

def parse_response(response):
    for raw_output, infer_output_tensor in zip(response.raw_output_contents, response.outputs):
        name = infer_output_tensor.name
        print(infer_output_tensor)

        if infer_output_tensor.datatype == "BYTES":
            content = deserialize_bytes_tensor(raw_output)
            print([entry.decode("utf-8") for entry in content.tolist()])

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('-v',
                        '--verbose',
                        action="store_true",
                        required=False,
                        default=False,
                        help='Enable verbose output')
    parser.add_argument('-u',
                        '--url',
                        type=str,
                        required=False,
                        default='localhost:8001',
                        help='Inference server URL. Default is localhost:8001.')

    FLAGS = parser.parse_args()

    model_name = "intent_a"
    model_version = ""
    batch_size = 32

    # Create gRPC stub for communicating with the server
    channel = grpc.insecure_channel(FLAGS.url)
    grpc_stub = service_pb2_grpc.GRPCInferenceServiceStub(channel)

    # Health
    try:
        request = service_pb2.ServerLiveRequest()
        response = grpc_stub.ServerLive(request)
        print("server {}".format(response))
    except Exception as ex:
        print(ex)

    request = service_pb2.ServerReadyRequest()
    response = grpc_stub.ServerReady(request)
    print("server {}".format(response))

    request = service_pb2.ModelReadyRequest(name=model_name,
                                            version=model_version)
    response = grpc_stub.ModelReady(request)
    print("model {}".format(response))

    # Metadata
    request = service_pb2.ServerMetadataRequest()
    response = grpc_stub.ServerMetadata(request)
    print("server metadata:\n{}".format(response))

    request = service_pb2.ModelMetadataRequest(name=model_name,
                                               version=model_version)
    response = grpc_stub.ModelMetadata(request)
    print("model metadata:\n{}".format(response))

    # Configuration
    request = service_pb2.ModelConfigRequest(name=model_name,
                                             version=model_version)
    response = grpc_stub.ModelConfig(request)
    print("model config:\n{}".format(response))

    # Infer
    request = service_pb2.ModelInferRequest()
    request.model_name = model_name
    request.model_version = model_version
    request.id = "my request id"

    inputs = []

    input_ids = service_pb2.ModelInferRequest().InferInputTensor()
    input_ids.name = "input_ids"
    input_ids.datatype = "INT32"
    input_ids.shape.extend([1, 128])
    input_ids.contents.int_contents[:] = [0] * 128

    inputs = [input_ids]

    request.inputs.extend(inputs)
    del inputs

    outputs = []
    output = service_pb2.ModelInferRequest().InferRequestedOutputTensor(name="logits")
    outputs.append(output)
    output = service_pb2.ModelInferRequest().InferRequestedOutputTensor(name="probs")
    output.parameters['classification'].int64_param = 8
    outputs.append(output)
    request.outputs.extend(outputs)
    del outputs

    response = grpc_stub.ModelInfer(request)
    infer_result = tritonclient.grpc.InferResult(response)
    parse_response(response)

I got following output:

name: "logits"
datatype: "FP32"
shape: 1
shape: 8

name: "probs"
datatype: "BYTES"
shape: 1
shape: 8

['0.784212:0', '0.140436:7', '0.036415:5', '0.029366:3', '0.006338:1', '0.002348:2', '0.000602:4', '0.000283:6']

It tuns out that by specifying output.parameters['classification'].int64_param as suggest, the model server will sort numbers among the last dimension and encode output as ["{value}:{idx}", ...] instead of some ["label-c", "label-d", ...] sorted by their corresponding values, which is the expected behavior.

Describe the solution you'd like

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Server version: nvcr.io/nvidia/tritonserver:21.09-py3

tanmayv25 commented 2 years ago

You can see the classification API documentation here: https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_classification.md.

Looks like the config is not picking up the label file correctly. The expected output should be as described in the documentation : ["{value}:{idx}:label-c", "{value}:{idx}:label-d", ...] Can you share your complete response? Also not sure whether it should work just rename label to label.txt?

dzier commented 2 years ago

Closing due to lack of activity