Closed pseudotensor closed 6 months ago
Thank you for the report. I think this is because newline is allowed and the model can indefinitely spam new lines if it choose to be. This commonly occurs when the model is not well prompted or finetuned for this workload. Even with guided JSON, give a few shot examples could be helpful. So prompting would be the simplest fix.
I can think of two slightly more complex fix:
Thanks, understood, but so many other models have no issue with the same prompt. So while prompting might help, would be great to use this as edge case that could make vLLM more reliable in general.
I noticed in the vllm test that you put the json template in the prompt itself. I suppose that might help, but I didn't notice it as required for any other models.
Sounds like a reasonable solution to not allow repeated white space, no usefulness and not an issue with strict json following.
I confirmed that adding to the prompt the prompt template seems to work, but I recommend still having the regexp that leads to guided_json such that multilpe white spaces are not allowed between keys and values and other such non-quoted locations.
You need to instruct the model exactly what you expect it to do. The guided_json parameter (and siblings) will only enforce the generation matches the spec. You still need to nudge the LLM to produce an output in the format you want by adding it in your prompt.
@br3no That's fine, but other models have no issues with following the schema, so I feel like this is an issue that could happen to any model potentially.
There's absolutely no reason for the regexp going into the guided_json to allow arbitrary amounts of white space between key and values.
There's absolutely no reason for the regexp going into the guided_json to allow arbitrary amounts of white space between key and values.
Absolutely no reason? Unless you know with certainty what models have seen during training the most reasonable thing you can do is assume inputs followed the JSON spec more generally: https://www.json.org/json-en.html
@rlouf Your webpage doesn't say anything relevant to the amount of white space.
My point is that JSON is defined so that the amount of white space does not matter, so might as well restrict generation to only include the minimal amount of white space required to constraint the generation.
E.g.
i.e. Use one space after the name-separator (colon)
In this case, vLLM did not do this.
@rlouf Your webpage doesn't say anything relevant to the amount of white space.
It gives the grammar, which defines what is considered valid JSON.
My point is that JSON is defined so that the amount of white space does not matter, so might as well restrict generation to only include the minimal amount of white space required to constraint the generation.
To my point, do you know what models have seen during training? What if you impose this restriction and models have seen something different? What if different model training processes pre-process JSON differently or what if they don't pre-process it at all?
@rlouf I've provided a real case that shows a problem with how it's done right now and there is a solution. Your statement is just speculative questions without any evidence showing any problem.
Are you suggesting we should only accommodate one particular use case?
I suggest we address real problems with reasonable solutions. The proposal to limit the white space between key and value is perfectly compatible with recommendations of how JSON should be constructed.
If we are worried about being too limiting, then ok, limit to no more than 3 spaces. But in my case it make 1000 new lines :) I don't think that's useful lack of constraint, and I also think new lines as white space between key and value are not normal JSON.
@rlouf BTW, amazing project (outlines).
I suggest we address real problems with reasonable solutions. The proposal to limit the white space between key and value is perfectly compatible with recommendations of how JSON should be constructed.
We could limit to max 4 spaces and one line break, I doubt models have seen many objects with more white spaces and line breaks. Btw you can already test for this in Outlines directly, we have a whitespace_pattern
keyword argument to generate.json
.
Is prompting working though?
@rlouf It's pros and cons. The problem with prompting is that a schema may be 2000 tokens, and that wastes alot of tokens. I did do that and it works, but I'd prefer not to have to. It's a big con.
Other models has no issue at all without the schema, e.g.:
vllm_base_models = ['h2oai/h2ogpt-4096-llama2-70b-chat',
'HuggingFaceH4/zephyr-7b-beta', 'mistralai/Mistral-7B-Instruct-v0.2', 'openchat/openchat-3.5-1210',
'h2oai/h2ogpt-32k-codellama-34b-instruct', 'NousResearch/Nous-Capybara-34B',
'mistralai/Mixtral-8x7B-Instruct-v0.1',
'h2oai/h2o-danube2-1.8b-chat',
'google/gemma-1.1-7b-it', 'h2oai/mixtral-gm-rag-experimental-v2',
'databricks/dbrx-instruct', 'CohereForAI/c4ai-command-r-v01']
for the test case I considered.
I don't see really any cons to the white space limiting, esp. if we let it go out to 3-4, and avoid new lines as well as part of white space (only one at end as you said).
Could you give this a try using generate.json
in Outlines and play with whitespace_pattern
? We have an integration for vLLM's offline interface.
And let's open a discussion in Outlines to not pollute vLLM's maintainers' notifications :)
I ran into this problem with Llama3 8b instruct and solved it by running this PR #4305 and setting guided_whitespace_pattern
to " "
.
Same problem with phi-3-mini-instruct as well
@rlouf FYI I h it this even with 'mistralai/Mixtral-8x7B-Instruct-v0.1' even when providing the schema to the model. Putting '[ \t\n]' failed and '\n' failed to work, but ' ' did ok like @robcaulk mentioned above.
Your current environment
🐛 Describe the bug
Using vllm 0.4.0.post1
streaming or not doesn't matter, happens every time same way. Get this response:
Here ... means a MASSIVE number of new lines.
I've tried about 10 other models, all are ok except llama2-13b.
Others that work: