Open fahim9778 opened 2 months ago
Hi, can you try making your system prompt more explicit on how much the model should respond? When you say "act", it probably direct the LLM to act out a whole script. I would try something like sys: "you are an experienced.... Your responses must be not more than one sentence".
Hi, can you try making your system prompt more explicit on how much the model should respond? When you say "act", it probably direct the LLM to act out a whole script. I would try something like sys: "you are an experienced.... Your responses must be not more than one sentence".
I tried but nothing works in my side. Can you please share your code so that I may get help from it?
Describe the bug
I am trying to eliminate this self-chattiness following several methods found over the internet. But there's no solution yet. Can anyone please help with this? I have been stuck with the last 7 days, burning GPU memories and allocation hours with no result.
Minimal reproducible example
Then using the HF TGI pipleline.
Then I am using this templates to simulate the chat-bot conversation.
Output
Runtime Environment
meta-llama-3-8b-instruct
Additional context Can you please help to solve this annoyance?? Thanks in advance!
I tried with
meta-llama/Llama-2-7b-chat-hf
and still the same chattiness: