Closed warlock135 closed 1 month ago
Hit this same exception
Hit this same exception
The docs here show that guided decoding and multi-step don't work together yet: https://docs.vllm.ai/en/latest/serving/compatibility_matrix.html
The docs here show that guided decoding and multi-step don't work together yet: https://docs.vllm.ai/en/latest/serving/compatibility_matrix.html
I think the appropriate behavior here should be to response with an HTTP status code other than 200 (e.g., 400, 500, 501) and continue operating, rather than using an assertion that crashes the engine. The current behavior prevents the service from being exposed directly to clients with multi-steps enabled, as a single request could bring it down.
Ah, yeah I also agree with that, that's a common problem in vLLM. Maybe I can find some time to make this 400 instead
Your current environment
vllm container v0.6.2 (vllm/vllm-openai:v0.6.2) Models: LLama-3-70b-Instruct, LLama-3-8b-Instruct, Qwen-2.5-32b-Instruct GPUs: A100, A30
Model Input Dumps
No response
π Describe the bug
When using --num-scheduler-steps 8 and request with "response_format": { "type": "json_object" }, vllm raise an error and crash after that. The error log:
Changing response_format type to text or removing num-scheduler-steps and everything works fine
Before submitting a new issue...