Background
vLLM currently supports various model features through configuration parameters, but lacks support for passing additional model-specific parameters through extra_body, which is particularly important for features like structured output.
https://github.com/vllm-project/vllm/blob/v0.6.0/vllm/engine/arg_utils.py#L276
Current OpenAI implementation
completion = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Generate a user profile"}],
extra_body={
"guided_json": Test.schema_json,
"guided_decoding_backend": "lm-format-enforcer"
}
)
Background vLLM currently supports various model features through configuration parameters, but lacks support for passing additional model-specific parameters through extra_body, which is particularly important for features like structured output. https://github.com/vllm-project/vllm/blob/v0.6.0/vllm/engine/arg_utils.py#L276
Current OpenAI implementation
Proposed implementation