triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.27k stars 1.47k forks source link

allow constant input tensors in (ensemble) models #5037

Open charlesmelby opened 1 year ago

charlesmelby commented 1 year ago

Is your feature request related to a problem? Please describe. I am working with multiple nested ensembles that use shared components (some are python) with ensemble-dependent configuration.

I would like to be able to specify the configuration as a constant tensor in the ensemble config. (N.B. Actually it would be nice to be able to define constant input tensors for backends other than ensemble.) Currently I either have to

  1. provide the configuration data as an input in the inference request, which requires the requester to be aware of internal details; or
  2. maintain a different copy of the component for every ensemble, which is unclean and results in a large number of unnecessary python processes.

Describe the solution you'd like For example, for a model that multiplies two numbers

name: "multiply-numbers"
platform: "onnxruntime_onnx"
backend: "onnxruntime"
max_batch_size: 0
input {
    name: "a"
    data_type: TYPE_FP32
    dims: [1]
}
input {
    name: "b"
    data_type: TYPE_FP32
    dims: [1]
}
output {
    name: "c"
    data_type: TYPE_FP32
    dims: [1]
}

I'd like to use something akin to tensor protos to specify a value for an input tensor:

name: "multiply-by-5-ensemble"
platform: "ensemble"
max_batch_size: 0
input {
    name: "ensemble-a"
    data_type: TYPE_FP32
    dims: [1]
}
constant {
    name: "constant-b"
    data_type: TYPE_FP32
    dims: [1]
    fp32_value: [0.5]
}
output {
    name: "ensemble-c"
    data_type: TYPE_FP32
    dims: [1]
}
ensemble_scheduling {
  step {
    model_name: "multiply-numbers"
    model_version: -1
    input_map { key: "a", value: "ensemble-a" }
    input_map { key: "b", value: "constant-b" }
    output_map { key: "c", value: "ensemble-c" }
  }
}

Describe alternatives you've considered The other option I can think of is creating a backend that generates a constant tensor from an empty input, although I don't know if tritonserver can accommodate models with no inputs.

charlesmelby commented 1 year ago

An additional option to read from a file like in ModelWarmup.Input would also be useful.

tanmayv25 commented 1 year ago

Tritonserver can definitely accommodate models with no inputs. I think this should unblock you for now. Thanks for the RFE.