Closed pseudotensor closed 5 months ago
Here's simpler repro, that happens about 90% of time.
prompt_llm = """<s>[INST] In order to write a concise single-paragraph summary, pay attention to the following text:
\"\"\"
The Commonwealth Bank of Australia (CBA) reported strong financial results for the first half of fiscal year 2023, with a statutory net profit after tax of AUD 5.216 billion, up 10% from the same period last year. Cash net profit after tax stood at AUD 5.153 billion, a 9% increase. Operating performance also improved by 18% to AUD 7.820 billion. The bank's home and consumer lending gross lending reached AUD 77 billion, while business and corporate lending gross lending amounted to AUD 18 billion. CBA's net promoter scores (NPS) remained high, with the bank ranking first in the consumer, business, and institutional categories. The bank's liquid assets and deposit funding increased, and its weighted average maturity stood at 5.8 years. CBA's CET1 ratio was 11.4%, and it declared a dividend per share of AUD 2.10 (35 cents). However, the bank warned that forward-looking statements should be treated with caution due to current economic uncertainties and geopolitical risks.
\"\"\"
Using only the text above, write a condensed and concise summary of key results (preferably as one paragraph):
[/INST]"""
base_url = 'FILLME'
base_model = 'mistralai/Mixtral-8x7B-Instruct-v0.1'
api_key = 'EMPTY'
stream_output = False
client_kwargs = dict(model=base_model,
max_tokens=1024,
temperature=0,
stream=stream_output)
from openai import OpenAI, AsyncOpenAI
cls = OpenAI
client_args = dict(base_url=base_url, api_key=api_key)
openai_client = cls(**client_args)
client = openai_client.completions
client_kwargs.update(dict(prompt=prompt_llm))
responses = client.create(**client_kwargs)
text = responses.choices[0].text
print(text)
gives:
The Commonwealth Bank of Australia (CBA) announced robust financial results for the first half of fiscal year 2
Until I see otherwise, I'm going to assume the strict model card with space between <s>
and [INST]
is required as they say, until mistral models that have no space. With that change these particular cases do not have issues. Will re-open if see others.
I am experiencing the same thing with OpenHermes2.5-Mistral 7B AWQ. Chat template fix (I was applying ChatML by hand, turned it into tokenizer.apply_chat_template) didn't seem to fix it. Anyone has any fix?
@pseudotensor can you please reopen this issue? I too am facing this with Mixtral. Trying to generate JSONs, and they often get truncated, always ending at character "2", just like in your case (while trying to generate years like 2023 and 2024)
@WoosukKwon very interesting/maddening bug!
Sure I re-opened. I agree it's unlikely the prompt change should have mattered so much.
@vibhuagrawal14 I am seeing exactly the same bug. While writing years or dates, it stops at 2. This is for Mixtral model. @pseudotensor : any fixes or suggestions? Thanks.
@pseudotensor fixing the spacing between BOS string and [INST] does appear to have fixed the issue. Thanks.
I actually needed to use double spaces between the BOS (<s>
) and [INST]
for it to work, although in my case it truncated the response at numbers other than number 2
we do face this issues , any workaround
Adding a space between BOS and [INST]
fixes this issue for us as well.
This sounds like a fix is needed in the chat template? https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1/blob/1e637f2d7cb0a9d6fb1922f305cb784995190a83/tokenizer_config.json#L42
Here's a fix https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1/discussions/176/files but waiting for the mistral team.
You can load the fixed version of the chat template here in vLLM: https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#chat-template
See also this discussion: https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1/discussions/182 The spaces almost totally fix the issue, but not completely. It seems to arise from Mistral training corpus, which likely includes corrupted files.
We noticed mixtral behaving oddly, and narrow down to a (maybe) 100% repro on 0.2.7. Script is in the zip file. Just replace base_url's FILLIN with your endpoint.
testmixnew1.py.zip
Mixtral was run like:
The output is:
This is a bad output compared to normal as it is truncated. The server says it was a normal stop, but I don't believe it.
The prompt we used is a bit odd in order to repro what we see with normal prompts, so ignore that aspect.
There are several \u encodings in the text, which I'm worried about that leads to premature stop.