triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.34k stars 1.49k forks source link

incompatible constructor arguments for c_python_backend_utils.InferenceRequest #7639

Open adrtsang opened 1 month ago

adrtsang commented 1 month ago

Description

Implementing BLS in python backend to send in-flight inference request to another model using c_python_backend_utils.InferenceRequest() and passing in a list of c_python_backend_utils.Tensor objects as input raised an error in pb_stub.cc as follows:

E0920 19:06:18.877435 1 pb_stub.cc:721] "Failed to process the request(s) for model 'whs_inference_model_0_0', message: TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:\n 1. c_python_backend_utils.InferenceRequest(request_id: str = '', correlation_id: object = 0, inputs: List[triton::backend::python::PbTensor], requested_output_names: List[str], model_name: str, model_version: int = -1, flags: int = 0, timeout: int = 0, preferred_memory: c_python_backend_utils.PreferredMemory = <c_python_backend_utils.PreferredMemory object at 0x7f109a7f4230>, trace: c_python_backend_utils.InferenceTrace = <c_python_backend_utils.InferenceTrace object at 0x7f109a7f41f0>, parameters: object = None)\n\nInvoked with: kwargs: model_name='whs_model', inputs=[<c_python_backend_utils.Tensor object at 0x7f109a76f1b0>], requested_output_names=['output'], request_id=1\n\nAt:\n /models/whs_inference_model/1/model.py(229): execute\n"

Triton Information tritonserver:24.07

Are you using the Triton container or did you build it yourself? I built a container based on nvcr.io/nvidia/tritonserver:24.07-py3

To Reproduce Here's a snippet of my model.py

import triton_python_backend_utils as pb_utils
from torch.utils.dlpack import from_dlpack
from torch.utils.dlpack import to_dlpack
class TritonPythonModel:
   ...
   def execute(self, requests):
      responses = []
      for request in requests:
         input_array = pb_utils.get_input_tensor_by_name(request, "input_image_array")   
         ...
         input_tensor = pb_utils.Tensor.from_dlpack("input", to_dlpack(input_array))
         inference_request = pb_utils.InferenceRequest(
                        model_name=self.model_name,
                        inputs=[input_tensor],
                        requested_output_names=['output'],
                        )
   ...

The issue is that the tensor created by pb_utils.Tensor is of type c_python_backend_utils.Tensor object but the input in InferenceRequest() is expected to be a list of triton::backend::python::PbTensor objects. However, passing the c_python_backend_utils.Tensor object to InferenceResponse() is fine. Seems like this is a bug in pb_stub.cc.

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well). This model.py is running in the inference stage in an ensemble pipeline. I designed the pipeline to perform pre-processing -> inference -> post-processing.

Expected behavior It's expected the pb_utils.InferenceRequest() will accept input of list c_python_backend_utils.Tensor objects

oandreeva-nv commented 1 month ago

Hi @adrtsang , Could you please provide a minimal reproducer?

adrtsang commented 1 month ago

It's interesting that the problem goes away when simply reverse the order of inputs and requested_output_names as

inference_request = pb_utils.InferenceRequest(
                        model_name=self.model_name,
                        requested_output_names=['output'],
                        inputs=[input_tensor],
                        )