triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.07k stars 1.45k forks source link

Error generating stream: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]] #6997

Open thanhtung901 opened 6 months ago

thanhtung901 commented 6 months ago

hi everyone i runing tritonserver vllm and i want runing with dynamic batching, but i encountered an error. It seems like it has something to do with my input Inference with curl: curl -X POST localhost:8000/v2/models/vllm_model/generate -d '{"text_input": "What is Triton Inference Server?", "parameters": {"stream": false, "temperature": 0}}' The output is: {"error":"Error generating stream: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]"}

file config.txt backend: "vllm" max_batch_size: 4 dynamic_batching { max_queue_delay_microseconds: 1000 } model_transaction_policy { decoupled: True } input [ { name: "text_input" data_type: TYPE_STRING dims: [ -1 ] }, { name: "stream" data_type: TYPE_BOOL dims: [ 1 ] }, { name: "sampling_parameters" data_type: TYPE_STRING dims: [ 1 ] optional: true } ]

output [ { name: "text_output" data_type: TYPE_STRING dims: [ -1 ] } ] instance_group [ { count: 1 kind: KIND_CPU } ]

indrajit96 commented 6 months ago

Hi @thanhtung901 Looking into this.

oandreeva-nv commented 6 months ago

Hi @thanhtung901 , Could you please clarify what version of Triton you are using, and are you using vLLM backend, or you are deploying a vLLM model through a Python Backend

thanhtung901 commented 6 months ago

Hi @thanhtung901 , Could you please clarify what version of Triton you are using, and are you using vLLM backend, or you are deploying a vLLM model through a Python Backend

I using nvcr.io/nvidia/tritonserver:24.02-vllm-python-py3