vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.97k stars 4.71k forks source link

[Usage]: Regarding VLLM Structured Output Doubts #9456

Open hwb96 opened 1 month ago

hwb96 commented 1 month ago

Your current environment

vllm                              0.6.1.post2
vllm-flash-attn                   2.6.1

How would you like to use vllm

The document describes structured output and additional parameters as follows:

    add_special_tokens: bool = Field(
        default=True,
        description=(
            "If true (the default), special tokens (e.g. BOS) will be added to "
            "the prompt."),
    )
    response_format: Optional[ResponseFormat] = Field(
        default=None,
        description=
        ("Similar to chat completion, this parameter specifies the format of "
         "output. Only {'type': 'json_object'} or {'type': 'text' } is "
         "supported."),
    )
    guided_json: Optional[Union[str, dict, BaseModel]] = Field(
        default=None,
        description="If specified, the output will follow the JSON schema.",
    )
    guided_regex: Optional[str] = Field(
        default=None,
        description=(
            "If specified, the output will follow the regex pattern."),
    )
    guided_choice: Optional[List[str]] = Field(
        default=None,
        description=(
            "If specified, the output will be exactly one of the choices."),
    )
    guided_grammar: Optional[str] = Field(
        default=None,
        description=(
            "If specified, the output will follow the context free grammar."),
    )
    guided_decoding_backend: Optional[str] = Field(
        default=None,
        description=(
            "If specified, will override the default guided decoding backend "
            "of the server for this specific request. If set, must be one of "
            "'outlines' / 'lm-format-enforcer'"))
    guided_whitespace_pattern: Optional[str] = Field(
        default=None,
        description=(
            "If specified, will override the default whitespace pattern "
            "for guided json decoding."))
    priority: int = Field(
        default=0,
        description=(
            "The priority of the request (lower means earlier handling; "
            "default: 0). Any priority other than 0 will raise an error "
            "if the served model does not use priority scheduling."))

I see there are two parameters that support JSON format output:

  1. response_format set to {'type': 'json_object'}, which aligns with OpenAI.
  2. guided_json, which is bound with guided_decoding_backend (my personal understanding is that it is related to 'outlines' / 'lm-format-enforcer').

My question is, which parameter has a higher priority? Or when I use:

completion = client.chat.completions.create(
  model="mistralai/Mistral-7B-Instruct-v0.2",
  messages=[
    {"role": "user", "content": "Classify this sentiment: LMFE is wonderful!"}
  ],
  extra_body={
    "guided_regex": "[Pp]ositive|[Nn]egative",
    "guided_decoding_backend": "lm-format-enforcer"
  }
)

Is response_format dispensable? If I use response_format={'type': 'json_object'}, does it mean that guided_json is enabled by default?

Before submitting a new issue...

I'm really confused and looking forward to your answer, thank you!

sidharthrajaram commented 1 month ago

I have the same question ^

DarkLight1337 commented 1 month ago

@joerunde can you help answer this?

hpx502766238 commented 2 weeks ago

lm-format-enforcer seems not completely compatible with openai sdk yet,but outlines yes. So with lm-format-enforcer,I use a normal chat completions request,then parse the result in message content manually:

class Test(BaseModel):
xxxxx

completion=client.chat.completions.create(xxxx,(without response_format set),extra_body={
    "guided_json":Test.schema_json ,
    "guided_decoding_backend": "lm-format-enforcer"
  })

raw_json = completion.choices[0].message.content
parsed = ChatRoles.parse_raw(raw_json)