Open ywang96 opened 3 weeks ago
cc @patrickvonplaten - I haven't spent too much time on debugging why there's such inconsistency but only found out it's an issue on vLLM since we were very recently informed by Chatbot Arena about it, so it would be great if you can take a look or if you might have an idea why this is happening so we can fix it asap. Thanks!
Hey @ywang96,
Thanks for the ping - checking!
Just confirmed this is happening on text-only models so there's indeed something wrong with the detok on vLLM now...
model_name = "mistralai/Mistral-Nemo-Instruct-2407"
mistral_models_path = Path.home().joinpath('mistral_models', 'Pixtral')
mistral_models_path.mkdir(parents=True, exist_ok=True)
snapshot_download(repo_id=model_name, allow_patterns=["tekken.json"], local_dir=mistral_models_path)
tokenizer = MistralTokenizer.from_file(f"{mistral_models_path}/tekken.json") # MistralTokenizer
sampling_params = SamplingParams(temperature=0.0, max_tokens=8192)
llm = LLM(model=model_name, tokenizer_mode="mistral", enforce_eager=True, tensor_parallel_size=8)
prompt = "今天天气如何?"
messages = [
{
"role": "user",
"content": prompt,
},
]
outputs = llm.chat(messages, sampling_params=sampling_params)
print(outputs[0].outputs[0].text) # vLLM text output
print(outputs[0].outputs[0].token_ids)
print(tokenizer.decode(outputs[0].outputs[0].token_ids[:-1]))
Output:
很抱歉,我无法提供实时天气信息,因为我是一个文本生成模型,我无法��问实时数据。但是,您可以���索您所在地区的天气��报,或者查看当地的天气应用程序来获取最新的天气信息。
(13440, 81040, 1625, 3621, 13244, 10628, 113521, 6892, 4022, 6434, 35459, 15690, 47424, 1625, 14966, 3621, 2499, 26535, 11449, 5296, 7360, 5862, 86061, 24308, 1625, 3621, 13244, 10628, 5538, 1191, 9915, 6892, 4022, 128593, 1320, 5859, 1625, 48423, 18921, 1230, 6423, 73291, 48423, 5536, 2998, 71867, 2713, 6434, 35459, 12684, 1132, 24549, 1625, 22516, 37706, 9764, 5342, 8736, 2713, 6434, 35459, 34590, 12600, 31479, 55550, 4976, 68826, 32128, 7695, 11795, 2713, 6434, 35459, 15690, 47424, 1320, 2)
很抱歉,我无法提供实时天气信息,因为我是一个文本生成模型,我无法访问实时数据。但是,您可以搜索您所在地区的天气预报,或者查看当地的天气应用程序来获取最新的天气信息。
As far as I can tell, this is happening to Korean/Hangul too. I will take a look at it too if I have some bandwidth today!
Hey @ywang96,
Yes here is a fix: https://github.com/vllm-project/vllm/pull/8640
Essentially the problems comes from the following:
The PR liked above should fix it
@patrickvonplaten Thank you for your great work! I was using your branch but I hit a weird issue. There seems to be a KeyError
when decoding some Chinese characters.
Prompt:
Error:
Hey @BabyChouSr,
Can you try again with current "main" and if it still fails can you post a reproducible code snippet here? :-)
Your current environment
The output of `python collect_env.py`
```text Your output of `python collect_env.py` here ```Model Input Dumps
Code to repro
🐛 Describe the bug
When the engine is initialized with
tokenizer_model="mistral"
, there's some encoding error when it comes to certain languages. However, when using initializedMistralTokenizer
to decode the token ids directly there's no such issue.Output from the above code
Before submitting a new issue...