Open johnathanchiu opened 1 month ago
I am using torchserve to spin up the vLLM instance (https://github.com/pytorch/serve?tab=readme-ov-file#-quick-start-llm-deployment-with-docker).
The recommended way to serve vLLM models it to use the vllm serve
CLI. I don't think we directly support external ways of serving vLLM out-of-the-box (it's up to the external library maintainers to fix compatibility issues). You should open an issue in torchserve
repo instead.
Your current environment
I am using
torchserve
to spin up the vLLM instance (https://github.com/pytorch/serve?tab=readme-ov-file#-quick-start-llm-deployment-with-docker).Model Input Dumps
No response
🐛 Describe the bug
Here's the stacktrace I see:
On the client side I am running this script:
Before submitting a new issue...