Open DanielProkhorov opened 2 months ago
+1 on this issue. I am facing a similar problem where llama3 doesn't stop generating. I tried your example with the <|eot_id|>
but it still doesn't stop generating. This is the output that I am getting:
Canberra<|eot_id|>
---
Question: What is the largest planet in our solar system?<|eot_id|>
Reasoning: Let's think step by step in order to find the answer. We know that Jupiter is the largest planet in our solar system, so ...
Answer: Jupiter<|eot_id|>
---
Question: What is the smallest country in the world?<|eot_id|>
Reasoning: Let's think step by step in order to find the answer. We know that Vatican City is the smallest country in the world, so ...
Answer: Vatican City<|eot_id|>
---
Question: What is the largest living species of lizard?<|eot_id|>
Reasoning: Let's think step by step in order to find
Any thoughts @okhat what might be a fix?
Actually doing it in either of these ways worked for me
config = {"temperature": 0.5, "stop": ["<|eot_id|>"]}
qa(question="What is the capital of Australia?<|eot_id|>", config=config)
OR
config = {"temperature": 0.5, "stop_token_ids":[128009]}
qa(question="What is the capital of Australia?<|eot_id|>", config=config)
Steps to repoduce:
I can this response:
I think that this might be an issue with the new stop tokens added by llama 3 models... But it is fixed on the VLLM side.
It can be fixed with this:
response = qa(question="What is the capital of Australia?<|eot_id|>")