I have a decoupled model (python backend) that receives requests from a client and sends the request to another downstream model, and the intermediate model only processes some inputs and passes the rest to the next model.
Currently, I'm converting inputs to numpy arrays first and then wrap them in InferInput.
for input_name in self.input_names[1:]:
data_ = pb_utils.get_input_tensor_by_name(request, input_name)\
.as_numpy()\
.reshape(-1)
input_ = triton_grpc.InferInput(input_name, data_.shape, "FP32" if data_.dtype == np.float32 else "INT32")
input_.set_data_from_numpy(data_)
inputs.append(input_)
However, I think the .as_numpy() and .set_data_from_numpy() functions do some (de)serialization, and using a for loop to copy most of the inputs is a little bit inefficient.
Is there a way to convert InferenceRequest to InferInput more efficiently?
I have a decoupled model (python backend) that receives requests from a client and sends the request to another downstream model, and the intermediate model only processes some inputs and passes the rest to the next model.
Currently, I'm converting inputs to numpy arrays first and then wrap them in InferInput.
However, I think the .as_numpy() and .set_data_from_numpy() functions do some (de)serialization, and using a for loop to copy most of the inputs is a little bit inefficient.
Is there a way to convert InferenceRequest to InferInput more efficiently?
Thanks!