Open thanhtung901 opened 6 months ago
Hi @thanhtung901 Looking into this.
Hi @thanhtung901 , Could you please clarify what version of Triton you are using, and are you using vLLM backend, or you are deploying a vLLM model through a Python Backend
Hi @thanhtung901 , Could you please clarify what version of Triton you are using, and are you using vLLM backend, or you are deploying a vLLM model through a Python Backend
I using nvcr.io/nvidia/tritonserver:24.02-vllm-python-py3
hi everyone i runing tritonserver vllm and i want runing with dynamic batching, but i encountered an error. It seems like it has something to do with my input Inference with curl: curl -X POST localhost:8000/v2/models/vllm_model/generate -d '{"text_input": "What is Triton Inference Server?", "parameters": {"stream": false, "temperature": 0}}' The output is: {"error":"Error generating stream: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]"}
file config.txt backend: "vllm" max_batch_size: 4 dynamic_batching { max_queue_delay_microseconds: 1000 } model_transaction_policy { decoupled: True } input [ { name: "text_input" data_type: TYPE_STRING dims: [ -1 ] }, { name: "stream" data_type: TYPE_BOOL dims: [ 1 ] }, { name: "sampling_parameters" data_type: TYPE_STRING dims: [ 1 ] optional: true } ]
output [ { name: "text_output" data_type: TYPE_STRING dims: [ -1 ] } ] instance_group [ { count: 1 kind: KIND_CPU } ]