Closed oscar-martin closed 6 months ago
I have found it!
Just adding a SamplingParams
with the skip_special_tokens=False
to the llm.generate
call made it work.
Snipped code:
#...
llm = LLM(model=model, tokenizer=model) # Name or path of your model
p = SamplingParams(skip_special_tokens=False)
output = llm.generate("<|begincontext|><|user|>I'm hungry. Find places to eat please.<|system|>Sure thing. Which city would you like to eat in?<|user|>Let's go with Foster City please.<|system|>Sure. What kind of food are you hungry for?<|user|>Spicy Indian sound really good.<|system|>One moment. I found a great restaurant called Pastries N Chaat in Foster City.<|user|>Give me other suggestions as well<|system|>How about, Tabla Indian Restaurant in Foster City?<|user|>Can you find out if they are average priced?<|system|>sure. The price range would be inexpensive.<|user|>Perfect. That works<|system|>Should I reserve for you?<|beginlastuserutterance|>Yes, go ahead and do that.<|endlastuserutterance|><|endcontext|>", p)
print(output)
Your current environment
How would you like to use vllm
I have a fine-tuned model (from
mistralai/Mistral-7B-v0.1
) with additional tokens added and trained.Model
config.json
:The original model has 32000 as the vocab_size, I have added 27 additional tokens.
When I use the model to make inferences, the generated text does not "decode" additional tokens.
The code:
Output (with a bit of formatting for improving readability):
From it, the output of
output[0].outputs[0].text
isReserveRestaurant Restaurants^city->F
. I have manually decoded the token_ids and the expected text should be:<|begintarget|><|begindsts|><|begindst|><|beginintent|> ReserveRestaurant<|endintent|><|beginbelief|> Restaurants^city->F
.Generated token_ids:
token_ids=[32003, 32010, 32012, 32023, 22249, 9133, 3507, 440, 32024, 32014, 23657, 1549, 28815, 18373, 471, 28765]
. Tokens greater or equal to 32000 are not properly decoded.I have also tried with
python3 -m vllm.entrypoints.api_server --model "<model_and_tokenizer_id>" --tokenizer "<model_and_tokenizer_id>"
and I got same behavior.What can I do to decode directly the generated text without me having to decode it manually?