triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.07k stars 1.45k forks source link

GRPC prediction calls with BYTES input errors out in Big Endian Machines #5811

Open Jawahars opened 1 year ago

Jawahars commented 1 year ago

Description When Triton Server is hosted in Big Endian machine, GRPC calls with BYTES input fails.

Triton Information What version of Triton are you using? 23.01

Are you using the Triton container or did you build it yourself? Built using this docker file for s390x machine

To Reproduce

  1. Run Triton Server on a Big Endian machine. In my case, I used s390x machine.
  2. Let the input tensor be TYPE_STRING and the model just return the input it receives. The model configuration is from here
  3. To replicate the issue, please run the below code. It accepts 2 arguments.
    • host (ipaddress:port)
    • model name
      Python client code to replicate
from tritonclient.grpc import service_pb2, service_pb2_grpc, InferResult
import grpc
import numpy as np

import string
import random
import logging
import argparse

CLI = argparse.ArgumentParser()
CLI.add_argument("--model", type=str, default="model_test")
CLI.add_argument("--host", type=str, default="127.0.0.1:8001")
args = CLI.parse_args()

def test_grpc_predict(host, model):
    # We use a simple model that takes 1 STRING input and returns the same
    model_version = "1"
    batch_size = 2
    no_of_batch = 1
    input_datatype = 'BYTES'

    # Generate the request
    request = service_pb2.ModelInferRequest()
    request.model_name = model
    request.model_version = model_version

    # Populate the inputs in inference request
    request.inputs.extend([set_input(batch_size, no_of_batch, input_datatype)])

    # Populate the outputs in the inference request
    output0 = service_pb2.ModelInferRequest().InferRequestedOutputTensor()
    output0.name = "OUT0"
    request.outputs.extend([output0])

    channel = grpc.insecure_channel(host)
    stub = service_pb2_grpc.GRPCInferenceServiceStub(channel)

    response = stub.ModelInfer(request)
    results = InferResult(response)
    logging.debug('inference output from grpc %s',
                  results.as_numpy('OUT0'))
    logging.debug('inference output complete from grpc')

def generate_test_data(batchSize: int = 2, noOfbatch: int = 1, input_datatype: str = 'FP64', isEncode=False):
    if (input_datatype.startswith('FP')):
        random_data_array = np.random.rand(noOfbatch, batchSize)
    elif (input_datatype.startswith('UINT')):
        random_data_array = np.random.randint(0, 1000, (noOfbatch, batchSize))
    elif (input_datatype.startswith('INT')):
        random_data_array = np.random.randint(-1000,
                                              1000, (noOfbatch, batchSize))
    elif (input_datatype.startswith('BOOL')):
        random_data_array = np.random.choice(
            a=[False, True], size=(noOfbatch, batchSize))
    else:
        random_data_array = np.array([generate_random_words(
            numberOfWords=2, isEncode=isEncode) for i in range(noOfbatch)])
    return random_data_array

def generate_random_words(numberOfWords: int = 1, charLen: int = 4, isEncode=False):
    if (isEncode):
        return [generate_random_word(charLen).encode('utf-8') for i in range(numberOfWords)]
    return [generate_random_word(charLen) for i in range(numberOfWords)]

def generate_random_word(charLen: int = 4):
    chrs = string.ascii_lowercase  # Change your required characters here
    return ''.join(random.choices(chrs, k=charLen))

def set_input(batch_size, no_of_batch, input_datatype):
    input0 = service_pb2.ModelInferRequest().InferInputTensor()
    input0.name = "IN0"
    input0.shape.extend([no_of_batch, batch_size])
    input0.datatype = get_datatype(input_datatype)
    # Input data
    testdata = generate_test_data(
        batchSize=batch_size, noOfbatch=no_of_batch, input_datatype=input_datatype, isEncode=True)
    input0_data = testdata.flatten()
    if (input0.datatype == 'STRING' or input0.datatype == 'BYTES'):
        input0.contents.bytes_contents[:] = input0_data
    elif (input0.datatype == 'FP64'):
        input0.contents.fp32_contents[:] = input0_data
    elif (input0.datatype == 'INT8'):
        input0.contents.int_contents[:] = input0_data
    elif (input0.datatype == 'BOOL'):
        input0.contents.bool_contents[:] = input0_data
    return input0

def get_datatype(input_datatype: str = 'FP64'):
    if (input_datatype == 'STRING'):
        return 'BYTES'
    return input_datatype

if __name__ == "__main__":
    test_grpc_predict(args.host, args.model)

Actual behavior Gets the following error. {grpc_message:"Failed to process the request(s) for model instance \'model_test_0_1\', message: error: unpack_from requires a buffer of at least 67108868 bytes for unpacking 67108864 bytes at offset 4 (actual buffer size is 16)\n\nAt:\n /opt/tritonserver/backends/python/triton_python_backend_utils.py(117): deserialize_bytes_tensor\n", grpc_status:13, created_time:"2023-05-18T11:03:38.200987+05:30"}"

Expected behavior The infer method should return the output tensor that is same as input

Additional Information

dyastremsky commented 1 year ago

Thank you for your detailed bug report. There is a fix being implemented in this pull request.

Jawahars commented 1 year ago

Thank you for your detailed bug report. There is a fix being implemented in this pull request.

Thank you. It seems like this pull request is to address the same issue but for HTTP. We might need to do something similar for GRPC

dyastremsky commented 1 year ago

Good catch, thanks. We've filed a ticket for this.

nnshah1 commented 1 year ago

@Jawahars Quick question on other data types. Were you able to confirm correct operation of INT8 / FP32 types? My guess is there are similar issues with endianness - but wanted to confirm.

Jawahars commented 1 year ago

@nnshah1 For FP64, interestingly, there is no error but the output is not as expected. The expected output is same as input. Input

[
name: "IN0"
datatype: "FP64"
shape: 1
shape: 2
contents {
  fp64_contents: 0.21946542841044636
  fp64_contents: 0.9776509135766875
}
]

Actual output [[ 7.75420243e+19 -4.00605503e-44]]

Similarly for INT8

Input

[
name: "IN0"
datatype: "INT8"
shape: 1
shape: 2
contents {
  int_contents: -441
  int_contents: 242
}
]

Output [[-1 0]]

nnshah1 commented 1 year ago

Thanks for confirming - that is what I expected as well (no exception, but incorrect values)- am working with the team on next steps for big endian support.

jgrsdave commented 10 months ago

@dyastremsky pull request solved the issue for HTTP. Do we have any solution for GRPC ?

dyastremsky commented 10 months ago

I believe from @nnshah1's comment here that the GRPC solution for Big Endian will be upcoming.