I’m using triton python_backend to run the pytorch example in the python_backend repo. I packaged the pytroch dependencies into a conda environment and can
load the model successfully. However when running the client inference script provided in the repo, I encounter the following error when trying to get the output from the httpclient response. It seems that the response is empty.
File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python310/lib/python3.10/site-packages/tritonclient/http/_infer_result.py:208, in InferResult.as_numpy(self, name)
204 if not has_binary_data:
205 np_array = np.array(
206 output["data"], dtype=triton_to_np_dtype(datatype)
207 )
--> 208 np_array = np_array.reshape(output["shape"])
209 return np_array
210 return None
ValueError: cannot reshape array of size 0 into shape (4,)
I tried running the add_sub example and the response.as_numpy("OUTPUT0") worked fine with the expected output.
Triton Information
What version of Triton are you using?
server_version 2.41.0
Are you using the Triton container or did you build it yourself?
I’m using a sagemaker docker image for triton server: 763104351884.dkr.ecr.us-east-1.amazonaws.com/sagemaker-tritonserver:23.12-py3
To Reproduce
The model.py
import json
# triton_python_backend_utils is available in every Triton Python model. You
# need to use this module to create inference requests and responses. It also
# contains some utility functions for extracting information from model_config
# and converting Triton input/output types to numpy types.
import triton_python_backend_utils as pb_utils
from torch import nn
class AddSubNet(nn.Module):
"""
Simple AddSub network in PyTorch. This network outputs the sum and
subtraction of the inputs.
"""
def __init__(self):
super(AddSubNet, self).__init__()
def forward(self, input0, input1):
return (input0 + input1), (input0 - input1)
class TritonPythonModel:
"""Your Python model must use the same class name. Every Python model
that is created must have "TritonPythonModel" as the class name.
"""
def initialize(self, args):
"""`initialize` is called only once when the model is being loaded.
Implementing `initialize` function is optional. This function allows
the model to initialize any state associated with this model.
Parameters
----------
args : dict
Both keys and values are strings. The dictionary keys and values are:
* model_config: A JSON string containing the model configuration
* model_instance_kind: A string containing model instance kind
* model_instance_device_id: A string containing model instance device ID
* model_repository: Model repository path
* model_version: Model version
* model_name: Model name
"""
# You must parse model_config. JSON string is not parsed here
self.model_config = model_config = json.loads(args["model_config"])
# Get OUTPUT0 configuration
output0_config = pb_utils.get_output_config_by_name(model_config, "OUTPUT0")
# Get OUTPUT1 configuration
output1_config = pb_utils.get_output_config_by_name(model_config, "OUTPUT1")
# Convert Triton types to numpy types
self.output0_dtype = pb_utils.triton_string_to_numpy(
output0_config["data_type"]
)
self.output1_dtype = pb_utils.triton_string_to_numpy(
output1_config["data_type"]
)
# Instantiate the PyTorch model
self.add_sub_model = AddSubNet()
def execute(self, requests):
"""`execute` must be implemented in every Python model. `execute`
function receives a list of pb_utils.InferenceRequest as the only
argument. This function is called when an inference is requested
for this model. Depending on the batching configuration (e.g. Dynamic
Batching) used, `requests` may contain multiple requests. Every
Python model, must create one pb_utils.InferenceResponse for every
pb_utils.InferenceRequest in `requests`. If there is an error, you can
set the error argument when creating a pb_utils.InferenceResponse.
Parameters
----------
requests : list
A list of pb_utils.InferenceRequest
Returns
-------
list
A list of pb_utils.InferenceResponse. The length of this list must
be the same as `requests`
"""
output0_dtype = self.output0_dtype
output1_dtype = self.output1_dtype
responses = []
# Every Python backend must iterate over everyone of the requests
# and create a pb_utils.InferenceResponse for each of them.
for request in requests:
# Get INPUT0
in_0 = pb_utils.get_input_tensor_by_name(request, "INPUT0")
# Get INPUT1
in_1 = pb_utils.get_input_tensor_by_name(request, "INPUT1")
out_0, out_1 = self.add_sub_model(in_0.as_numpy(), in_1.as_numpy())
# Create output tensors. You need pb_utils.Tensor
# objects to create pb_utils.InferenceResponse.
out_tensor_0 = pb_utils.Tensor("OUTPUT0", out_0.astype(output0_dtype))
out_tensor_1 = pb_utils.Tensor("OUTPUT1", out_1.astype(output1_dtype))
# Create InferenceResponse. You can set an error here in case
# there was a problem with handling this inference request.
# Below is an example of how you can set errors in inference
# response:
#
# pb_utils.InferenceResponse(
# output_tensors=..., TritonError("An error occurred"))
inference_response = pb_utils.InferenceResponse(
output_tensors=[out_tensor_0, out_tensor_1]
)
responses.append(inference_response)
# You should return a list of pb_utils.InferenceResponse. Length
# of this list must match the length of `requests` list.
return responses
def finalize(self):
"""`finalize` is called only once when the model is being unloaded.
Implementing `finalize` function is optional. This function allows
the model to perform any necessary clean ups before exit.
"""
print("Cleaning up...")
Description
I’m using triton python_backend to run the pytorch example in the python_backend repo. I packaged the pytroch dependencies into a conda environment and can load the model successfully. However when running the client inference script provided in the repo, I encounter the following error when trying to get the output from the httpclient response. It seems that the response is empty.
ValueError Traceback (most recent call last) Cell In[20], line 33 30 response = client.infer(model_name, inputs, request_id=str(1), outputs=outputs) 32 result = response.get_response() ---> 33 output0_data = response.as_numpy("OUTPUT0") 34 output1_data = response.as_numpy("OUTPUT1") 36 print( 37 "INPUT0 ({}) + INPUT1 ({}) = OUTPUT0 ({})".format( 38 input0_data, input1_data, output0_data 39 ) 40 )
File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python310/lib/python3.10/site-packages/tritonclient/http/_infer_result.py:208, in InferResult.as_numpy(self, name) 204 if not has_binary_data: 205 np_array = np.array( 206 output["data"], dtype=triton_to_np_dtype(datatype) 207 ) --> 208 np_array = np_array.reshape(output["shape"]) 209 return np_array 210 return None
ValueError: cannot reshape array of size 0 into shape (4,)
I tried running the add_sub example and the response.as_numpy("OUTPUT0") worked fine with the expected output.
Triton Information What version of Triton are you using?
server_version 2.41.0
Are you using the Triton container or did you build it yourself?
I’m using a sagemaker docker image for triton server: 763104351884.dkr.ecr.us-east-1.amazonaws.com/sagemaker-tritonserver:23.12-py3
To Reproduce
The model.py
config.pbtxt:
The client.py:
Expected behavior I expect that the client script
will convert the output tensors in the http response to numpy array.