Open Quang-elec44 opened 3 days ago
cc @merrymercy
Thanks for the reporting this. I can reproduce this error. The problem is that this model works better with multi-line style JSON but the default argument in sglang uses single-line style. We can fix this for both outlines and xgrammar backend.
You can add --constrained-json-whitespace-pattern "[\n\t ]*"
when you launch the server. See also #1438
python3 -m sglang.launch_server --model Qwen/Qwen2.5-7B-Instruct-AWQ --port 8007 --quantization awq_marlin --grammar-backend outlines --constrained-json-whitespace-pattern "[\n\t ]*"
Output
{
"model": {
"name": "Mistral 7B",
"number_of_parameters": "7 billion",
"number_of_max_tokens": "",
"architecture": ["grouped-query attention (GQA)", "sliding window attention (SWA)"]
},
"usage": {
"use_case": ["superior performance and efficiency", "reasoning", "mathematics", "code generation", "following instructions"],
"license": "Apache 2.0"
}
}
Try this commit https://github.com/sgl-project/sglang/commit/dd4482e3c796437c4ea67a80ed685667ad8ff947
python3 -m sglang.launch_server --model Qwen/Qwen2.5-7B-Instruct-AWQ --port 8007 --quantization awq_marlin --grammar-backend xgrammar
Output
{
"model": {
"name": "Mistral 7B",
"number_of_parameters": "7 billion",
"number_of_max_tokens": "",
"architecture": []
},
"usage": {
"use_case": [],
"license": "Apache 2.0"
}
}
However, this fix does not work. I think there are some subtle details on how it handle the whitespace @Ubospica .
I am also facing a wierd issue when using xgrammar as backend. Not sure if this is related.
I am using document prefix caching to do multiple extractions at the same time. Some of them use structured json output. And some of them outputs plain text. When using xgrammar using sgl.gen with json_schema, the outputs in plain text are changing, and sometimes not even terminating. Wierd thing is the plain text is not using json_schema.
While using outlines, its working as expected.
Isnt sgl.gen with json_schema and xgrammar not supported yet? I can provide more information to reproduce if necessary (but its custom code).
Thanks,
Checklist
Describe the bug
The results (w and w/o JSON schema) are different, while those generated from
vllm
server (v0.6.4.post1) remain the sameReproduction
How to start
sglang
serverHow to start
vllm
serverPython script
Results without json_schema
vllm
sglang
Results with json_schema
vllm
sglang (xgrammar backend)
sglang (outlines backend)
Environment