triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.27k stars 1.47k forks source link

Default inputs #6561

Open saraRaris opened 11 months ago

saraRaris commented 11 months ago

Is your feature request related to a problem? Please describe. I am trying to minimize the number of models I use, especially when only a couple parameters change internally. At the same time I'm also trying to make sure that any time I compile a model I can automatically fetch the size used in the compilation to automate the creation of appropriate config.pbtxt files.

To illustrate this better let's consider the following: We have ensemble_A and ensemble_B. Their pipelines are: ensemble_A: preprocess, deep_learning_model_0, postprocess_A ensemble_A: preprocess, deep_learning_model,_1, postprocess_B deep_learning_model_0 and deep_learning_model_1 are compiled using different images sizes. However, I would like to be able to use the same preprocess for both models as I don't want to have an infinity of models with the same exact functionality.

Describe the solution you'd like Ideally, I would like to be able to define default values for the image shape in the ensemble config.pbtxt file and then pass those over to the preprocess.

Describe alternatives you've considered I have considered passing these parameters through the client but this doesn't work all that well for me as I'm trying to retain the information on the server side at the moment. I don't want or need the client to be involved in the compilation decision that I'm purely handling in the server side.

Additional context Ideally it could look something like this:

{
  "name": "ensemble_A",
  "platform": "ensemble",
  "max_batch_size": 1,
  "input": [
    {
        name: "INPUT_IMAGESTR"
        data_type: TYPE_STRING
        dims: [-1]
    },
    {
      "name": "DIM",
      "data_type": "TYPE_FP32",
      "dims": [1, 1],
      "default": [640, 640],
      "optional": True
    }
  ],
  "output": [
    {
      "name": "OUTPUT",
      "data_type": "TYPE_FP32",
      "dims": [1, 10]
    }
  ]
}
SunXuan90 commented 10 months ago

This is needed, especially for dali pipeline, right now all constants need to be built into pipeline, and the pipeline can't be share between projects, even though they are only different in a few parameters.

dyastremsky commented 8 months ago

We have opened a ticket to look into this enhancement.

ref: 6179

Related issue: https://github.com/triton-inference-server/server/issues/6696

riZZZhik commented 3 months ago

@dyastremsky Hi,

Any news about this enhancement?

dyastremsky commented 3 months ago

Thanks for reaching out, Dmitry! Not yet.

riZZZhik commented 3 months ago

Not very good (imho) alternative: You can store default variables in config.pbtxt parameters or somewhere else and if input not given - place in default during execute

Example:

name: "ultra_super_model"
backend: "python"

parameters [
    {
        key: "inputs_defaults"
        value: {
            string_value: '{"not_very_important_input": "input1 - sorry to disappoint, tried my best to be useful"}'
        }
    }
]

input [
    {
        name: "very_important_input"
        data_type: TYPE_STRING
        dims: [-1]
    },
    {
        name: "not_very_important_input"
        data_type: TYPE_STRING
        dims: [-1]
        optional: true
    }
]

...
import json

import numpy as np
import triton_python_backend_utils as pb_utils

class TritonPythonModel:
    def initialize(self, args: dict[str, Any]) -> None:
        self.model_config = json.loads(args["model_config"])
        self.inputs_defaults = json.loads(
            self.model_config["parameters"]["inputs_defaults"]["string_value"]
        )

        self.inputs = [input_["name"] for input_ in self.model_config["input"]]

    def execute(self, requests: list["pb_utils.InferenceRequest"]) -> list["pb_utils.InferenceResponse"]:
        ...
        for request in requests:
                inputs  = self._get_request_inputs(request)
                ...

    def _get_request_inputs(self, request: "pb_utils.InferenceRequest") -> dict[str, np.ndarray]:
        inputs = {}
        for name in self.inputs:
            input_value = pb_utils.get_input_tensor_by_name(request, name)
            if input_value is None:
                # we can skip checking is variable optional cuz triton won't pass this case
                inputs[name] = self.inputs_defaults[name]
            else:
                inputs[name] = input_value.as_numpy()

        return inputs
Leggerla commented 2 months ago

Yes, I really need this as well

MatthieuToulemont commented 1 month ago

Coming back here, this would be amazing and makes sense for ensemble models I believe