Closed alexl83 closed 1 year ago
I have added initial support here https://github.com/oobabooga/text-generation-webui/commit/cb95a2432c9506e494e018f2424f04b934bb678c
It seems to work at first glance
Hi @oobabooga, thanks for your support!
it seems to be working in single-turn, I suggest adding model_type: 'llama'
in config.yaml to avoid having to specify --model_type llama
via command-line. I have a doubt about whitespaces: according to the makers of koala, BEGINNING OF CONVERSATION:
and USER:
should include a trailing whitespace. I don't know if textgen preserves those if present.
About multi-turn: do you see feasible to include model output as next turn input, to provide context and carry over dialogue? It's documented in their inference code/docs, and it works decently on their live demo
TURN 1:
BEGINNING OF CONVERSATION: USER: Hello! GPT:Hi! How can I help you?</s>
TURN 2:
BEGINNING OF CONVERSATION: USER: Hello! GPT:Hi! How can I help you?</s>USER: What is the largest animal on earth? GPT:
Thank you for your outstanding work!
EDIT: still not working, something's off: sending empty strings makes the model provide random answers
This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.
Describe the bug
prompt for Koala https://bair.berkeley.edu/blog/2023/04/03/koala/ needs to be formatted very specifically https://github.com/young-geng/EasyLM/blob/main/docs/koala.md#koala-chatbot-prompts
As can been seen in the serving command above, the Koala chatbot requires a series of prompts to be prepended and appended to the user input in order to generate response correctly. Hence, to use the Koala weights in other frameworks, you will need to process the prompts accordingly.
The beginning of prompt
BEGINNING OF CONVERSATION:
is always prepended to every conversation. For each user input, the user promptUSER:
is prepended to the user input, a space is appended to the user input and then the language model promptGPT:
is appended to the user input. This whole string will be used as prompt input to the language model for generating the response. For example, in the first round of conversation, when the user inputsHello!
, the whole prompt for generating the first response is:BEGINNING OF CONVERSATION: USER: Hello! GPT:
After the language model generates the response, we append the response to the prompt and then append the EOS token
</s>
to the prompt. Suppose the language model generates the following response:Hi! How can I help you?
, and for the next round, the user input isWhat is the largest animal on earth?
. Then the whole prompt for generating the second response is:BEGINNING OF CONVERSATION: USER: Hello! GPT:Hi! How can I help you?</s>USER: What is the largest animal on earth? GPT:
Note that due to the prompt and generated parts are tokenized separately, there's no space between the model prompt
GPT:
and the generated response.Is there an existing issue for this?
Reproduction
I created my own
Koala.yaml
incharacters/instruction-following
Koala.yaml
unfortunately, something is off: my wild assumption is that prompt concatenation as explained in https://github.com/young-geng/EasyLM/blob/main/docs/koala.md#koala-chatbot-prompts needs to be implemented
Koala is a multi-turn-tuned model, let's reproduce such a scenario
EasyLM inference
USER: Describe ten different open source AI libraries (and the languages they work with) that I can use to generate a summary of the main core contents of any web page, ignoring any ads or embedded materials.
USER: Are any of these free?
Text-generation-webui inference
USER: Describe ten different open source AI libraries (and the languages they work with) that I can use to generate a summary of the main core contents of any web page, ignoring any ads or embedded materials.
USER: Are any of these free?
Screenshot
No response
Logs
none, supposedly prompt concatenation as explained in https://github.com/young-geng/EasyLM/blob/main/docs/koala.md#koala-chatbot-prompts needs to be implemented (wild assumption)
System Info