Closed BabyChouSr closed 2 months ago
If you use lmms-lab/llama3-llava-next-8b
, you should not use the chat template in huggingface tokenizer. This is because the chat template in huggingface tokenizer does not support image.
The correct way is to use the custom chat template for llava-next. You can specify it when you launch the server. https://github.com/sgl-project/sglang/blob/55f5976b42d736f3dfe2f8f9b91a6536c212744a/README.md?plain=1#L246-L247
Checklist
Describe the bug
I noticed that the prompt template that is applied is incorrect because it does not parse the text correctly. More specifically, in the
openai_api/adapter.py
, we use Huggingface to tokenize:But, the request messages is still in Pydantic form so then it gets incorrectly tokenized by the huggingface tokenizer.
For example, if the request.messages is:
We get the following prompt and prompt ids:
I have a suspicion that this is because huggingface does not know that we are using Pydantic datamodels and so it just looks at the
content
section of the messages and parses it which is just a string-ified version of the Pydantic data model.Reproduction
python -m sglang.launch_server --model-path lmms-lab/llama3-llava-next-8b --port=30000
Client script:
Environment