Open npuichigo opened 10 months ago
https://platform.openai.com/docs/api-reference/completions/object#completions/object-usage What about add usage in trt ensemble models to return the token usage like openai? At lease the prompt and output token length. It would be eaiser to provide an OpenAI compatible API.
Have you solved the problem?
not yet
not yet
Do you know how to do it? Any ideas?
I think u could customize the logic in postprocess and preprocess to do the calculation.
I think u could customize the logic in postprocess and preprocess to do the calculation.
Thank you. I tried. It didn't work
I managed to get the output_token_len to the output, but can't add the input_token_len since this information is not directly passed down from the pipeline to the postprocessing model.
Here's how to do it:
We need to create a new output field in the postprocessing model, and make small changes to the code to handle the information retrieval and output.
The first step is to modify the postprocessing\config.pbtxt
, add the following content:
output [
{
name: "OUTPUT_TOKEN_LEN"
data_type: TYPE_INT32
dims: [ -1 ]
},
...
]
Then we need to chagne postprocessing\1\model.py
to add logic to output the tensor corresponding to the above output field.
class TritonPythonModel:
def initialize(self, args):
...
# Parse model output configs
output_names = ["OUTPUT", "OUTPUT_TOKEN_LEN"]
for output_name in output_names:
setattr(
self,
output_name.lower() + "_dtype",
pb_utils.triton_string_to_numpy(
pb_utils.get_output_config_by_name(
model_config, output_name)['data_type']))
def execute(self, requests):
...
# Number of tokens
output_token_len_tensor = pb_utils.Tensor(
'OUTPUT_TOKEN_LEN',
np.array(sequence_lengths).astype(self.output_token_len_dtype))
outputs.append(output_token_len_tensor)
Then, we can modify the ensemble\config.pbtxt
, add the new output field to both output
fields and the ensemble pipeline, as shown in the following content:
output[
{
name: "output_token_len"
data_type: TYPE_INT32
dims: [ -1 ]
},
...
]
ensemble_scheduling {
step [
{
model_name: "postprocessing"
model_version: -1
...
output_map {
key: "OUTPUT_TOKEN_LEN"
value: "output_token_len"
}
]
}
you can use https://github.com/npuichigo/openai_trtllm , it is a wrapper to create openai compatible api for tensorRT-LLM
https://platform.openai.com/docs/api-reference/completions/object#completions/object-usage What about add usage in trt ensemble models to return the token usage like openai? At lease the prompt and output token length. It would be eaiser to provide an OpenAI compatible API.