[Usage]: Regarding VLLM Structured Output Doubts

hwb96 commented 1 month ago

Your current environment

vllm                              0.6.1.post2
vllm-flash-attn                   2.6.1

How would you like to use vllm

The document describes structured output and additional parameters as follows:

    add_special_tokens: bool = Field(
        default=True,
        description=(
            "If true (the default), special tokens (e.g. BOS) will be added to "
            "the prompt."),
    )
    response_format: Optional[ResponseFormat] = Field(
        default=None,
        description=
        ("Similar to chat completion, this parameter specifies the format of "
         "output. Only {'type': 'json_object'} or {'type': 'text' } is "
         "supported."),
    )
    guided_json: Optional[Union[str, dict, BaseModel]] = Field(
        default=None,
        description="If specified, the output will follow the JSON schema.",
    )
    guided_regex: Optional[str] = Field(
        default=None,
        description=(
            "If specified, the output will follow the regex pattern."),
    )
    guided_choice: Optional[List[str]] = Field(
        default=None,
        description=(
            "If specified, the output will be exactly one of the choices."),
    )
    guided_grammar: Optional[str] = Field(
        default=None,
        description=(
            "If specified, the output will follow the context free grammar."),
    )
    guided_decoding_backend: Optional[str] = Field(
        default=None,
        description=(
            "If specified, will override the default guided decoding backend "
            "of the server for this specific request. If set, must be one of "
            "'outlines' / 'lm-format-enforcer'"))
    guided_whitespace_pattern: Optional[str] = Field(
        default=None,
        description=(
            "If specified, will override the default whitespace pattern "
            "for guided json decoding."))
    priority: int = Field(
        default=0,
        description=(
            "The priority of the request (lower means earlier handling; "
            "default: 0). Any priority other than 0 will raise an error "
            "if the served model does not use priority scheduling."))

I see there are two parameters that support JSON format output:

response_format set to {'type': 'json_object'}, which aligns with OpenAI.
guided_json, which is bound with guided_decoding_backend (my personal understanding is that it is related to 'outlines' / 'lm-format-enforcer').

My question is, which parameter has a higher priority? Or when I use:

completion = client.chat.completions.create(
  model="mistralai/Mistral-7B-Instruct-v0.2",
  messages=[
    {"role": "user", "content": "Classify this sentiment: LMFE is wonderful!"}
  ],
  extra_body={
    "guided_regex": "[Pp]ositive|[Nn]egative",
    "guided_decoding_backend": "lm-format-enforcer"
  }
)

Is response_format dispensable? If I use response_format={'type': 'json_object'}, does it mean that guided_json is enabled by default?

Before submitting a new issue...

[X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

I'm really confused and looking forward to your answer, thank you!

sidharthrajaram commented 1 month ago

I have the same question ^

DarkLight1337 commented 1 month ago

@joerunde can you help answer this?

hpx502766238 commented 2 weeks ago

lm-format-enforcer seems not completely compatible with openai sdk yet,but outlines yes. So with lm-format-enforcer,I use a normal chat completions request,then parse the result in message content manually:

class Test(BaseModel):
xxxxx

completion=client.chat.completions.create(xxxx,(without response_format set),extra_body={
    "guided_json":Test.schema_json ,
    "guided_decoding_backend": "lm-format-enforcer"
  })

raw_json = completion.choices[0].message.content
parsed = ChatRoles.parse_raw(raw_json)

vllm-project / vllm