Add PromptTemplate and allow for default PromptTemplate in model configuration

vincentmin commented 1 year ago

Feature request

As a user, I want to be able to load a model and feed it my input in such a way that it matches the prompt template that it saw during training. I want to be able to load the default prompt with few lines of code and without having to look up how the model was trained. Additionally, I want to be able to modify the prompt to be different from the default prompt.

The specific implementation is up for discussion. I imagine something like this:

from transformers import AutoModelForCausalLM, AutoTokenizer, AutoPromptTemplate

model_id = "meta-llama/Llama-2-xb-chat-hf"
model = AutoModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
prompt_template = AutoPromptTemplate.from_pretrained(model_id)

inputs = {
   "system_prompt":"You are a helpful assistant",
   "interactions":[
      {"user":"What is the fastest sea mammal?"},
      {"assistant":"The fastest sea mammal is the peregrine falcon"},
      {"user":"the peregrine falcon is not a mammal"}
   ]
}

output = model(**tokenizer(prompt_template(inputs)))

Motivation

The huggingface hub is accumulating many finetuned models, which have been trained with a specific prompt template in mind. However, this prompt template is often difficult to find, and even more often the prompt template is missing entirely from the model card. If the model is invoked with a different template, the model performance can be severely affected. The community would benefit from a PromptTemplate class that can be loaded from the model configuration that handles the prompt templating for the end user.

At this very moment, there are likely many users that are using the meta-llama/Llama-2-xb-chat-hf models with a prompting style that differs from how the model is intended to be used.

Your contribution

I am happy to be a part of the discussion for implementation and testing.

sgugger commented 1 year ago

cc @ArthurZucker

jimmytyyang commented 1 year ago

This is 100% needed!

ArthurZucker commented 1 year ago

Hey! Thanks for opening this. Not sure if you have seen this but we have the ConversationalPipeline along with the Conversation object, which can pretty easily handle conversations. You just need to override the _build_conversation_input_ids of the tokenizer that you are using. This allows for anyone to properly build their inputs and share the modeling code on the hub. Having an entirely new Auto module just for that is an overkill, and not really the intent of transformers.

However adding support for system_prompts in the Conversation object or the ConversationalPipeline can be done. We where not entirely sure of whether it would be highly requested or not.

vincentmin commented 1 year ago

Hi @ArthurZucker , thanks for your reply. I was unaware of the ConversationalPipeline, so thanks for putting it on my radar. However, neither the ConversationalPipeline nor the Conversation class handle the templating that is really the core of this feature request. Perhaps illustration with some examples will be helpful:

The Llama-2-xb-chat models use a very specific format of the following type:

input_prompt = f"[INST] <<SYS>>\n{system_message}\n<</SYS>>\n\n "
for interaction in chatbot:
    input_prompt = input_prompt + str(interaction[0]) + " [/INST] " + str(interaction[1]) + " </s><s> [INST] "

Instead, oasst1 models often use a format of the following type:

input_prompt = f"""<|system|>{system_message}</s><|prompter|>{user_prompt}</s><|assistant|>"""

Even models that are not chat models can have very specific prompt templates, such as this sql model:

table_prefix = "table:"
question_prefix = "question:"
join_table = ",".join(table)
input_prompt = f"{question_prefix} {question} {table_prefix} {join_table}"

I hope this illustrates that many models (not just chat models) on the Hugging Face hub come with an implicit specific prompt template. However, there is currently no way (that I know off) to instruct users to follow that specific prompt template, other than to describe the template on the model card. With this feature request, I am suggesting to create a more standardised way for model creators to add a prompt template to their model page.

Note that llama-2-70b-chat-hf has no mention of the expected prompt template. I think it is therefore likely that a significant portion of users are currently using the model with a different prompt template and are observing reduced model performance as a consequence.

If transformers would provide a standardised way to add prompt templates, I believe this would create an incentive for model creators to add their prompt template. This, combined with an easy way to use said template, would make it easier for users to get the best out of models on Hugging Face Hub.

For the implementation it is probably not necessary to have an entirely new Auto module. I'll let the developers be the judge of how to best implement this.

Rocketknight1 commented 1 year ago

Hi @vincentmin! We did some internal discussion and we decided this was a great idea. We're still discussing the specifics, but our current plan is to add a prompt field to tokenizer_config.json. The method that formats conversational prompts is Tokenizer._build_conversation_input_ids(), which is called by ConversationPipeline. Therefore, we think the tokenizer_config.json is the right place to add fields that override the behaviour of the underlying Tokenizer.

The specific fields in prompt would be class-specific, but for conversational models they would be e.g. system_message_start, system_message_end, etc. We think breaking up the prompt into string fields will work, and avoids the need to store full templates in the config files. These fields will be read by the tokenizer and used in _build_conversation_input_ids() to customize input prompts correctly.

Since _build_conversation_input_ids() is currently a private method that we mostly use internally in the Pipeline code, we may also look at ways to expose the prompt information through other properties or methods.

WDYT? The details are still flexibile, but we're planning to finalize a concrete plan soon!

MrRace commented 1 year ago

Hi @vincentmin! We did some internal discussion and we decided this was a great idea. We're still discussing the specifics, but our current plan is to add a prompt field to tokenizer_config.json. The method that formats conversational prompts is Tokenizer._build_conversation_input_ids(), which is called by ConversationPipeline. Therefore, we think the tokenizer_config.json is the right place to add fields that override the behaviour of the underlying Tokenizer.

The specific fields in prompt would be class-specific, but for conversational models they would be e.g. system_message_start, system_message_end, etc. We think breaking up the prompt into string fields will work, and avoids the need to store full templates in the config files. These fields will be read by the tokenizer and used in _build_conversation_input_ids() to customize input prompts correctly.

Since _build_conversation_input_ids() is currently a private method that we mostly use internally in the Pipeline code, we may also look at ways to expose the prompt information through other properties or methods.

WDYT? The details are still flexibile, but we're planning to finalize a concrete plan soon!

@Rocketknight1 How to use ConversationPipeline for llama2 chat？I want to do multi-turn chat. Could you show an example？ My code example :

from transformers import AutoTokenizer, LlamaTokenizerFast
from transformers import pipeline, Conversation
import torch

model = "/home/model_zoo/LLM/llama2/Llama-2-7b-chat-hf"

tokenizer = LlamaTokenizerFast.from_pretrained(model)
pipeline = pipeline(
    "conversational",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.float16,
    device_map="auto",
)

conversation_1 = Conversation("Going to the movies tonight - any suggestions?")
conversation_2 = Conversation("What's the last book you have read?")

print(pipeline([conversation_1, conversation_2]))

However it can not return normal response.

vincentmin commented 1 year ago

Hi @Rocketknight1, that is great to hear!

I like the proposal of adding a prompt field to tokenizer_config.json.

How do you intend to let Tokenizer._build_conversation_input_ids() use this prompt field? Will the current implementation of this function be modified as part of this issue, or is that left to the model creators? Since model prompting can get pretty wild, it may be hard to give a sufficiently general implementation for Tokenizer._build_conversation_input_ids() that works for all use cases.

Rocketknight1 commented 1 year ago

Hi @vincentmin, you're right, it's a surprisingly tricky question! My initial idea was that _build_conversation_input_ids() would be defined at the class level, but would read string arguments like system_message_start from the tokenizer config. However, this still hard-codes the ordering of elements in the prompt, which means it might not work for some prompts. I think we'll still do something like that for now and see how much of a problem it is, and if we have to we'll look into allowing some kind of more general template system.

This will require us to modify Tokenizer._build_conversation_input_ids() for each model that we want to support this, but we can do it one model at a time without needing a codebase-wide refactor.

Rocketknight1 commented 1 year ago

PR is open at #25323!

annahung31 commented 1 year ago

@MrRace It might be late for your question, but I still leave the demo here for other's reference.

import transformers
from transformers import AutoTokenizer, Conversation
import torch

model_path= "/home/model_zoo/LLM/llama2/Llama-2-7b-chat-hf"

tokenizer = AutoTokenizer.from_pretrained(model_path)

chatbot = transformers.pipeline(
    "conversational",
    model=model_path,
    torch_dtype=torch.float16,
    device_map="auto",
)

conversation = Conversation("Going to the movies tonight - any suggestions?")
conversation = chatbot(conversation, max_length =500)   # the default value of max_length is 200. You need to change it or you'll get empty response.

for msg in conversation.generated_responses[-1].split('\n'):
    print(msg)

### second round of consersation
conversation.add_user_input("I am afraid of watching thriller movies and preferred watching stories related to friendship between women. Then which one should I choose?")

conversation = chatbot(conversation, max_length =500)

for msg in conversation.generated_responses[-1].split('\n'):
    print(msg)

shimizust commented 9 months ago

@Rocketknight1 Thanks for implementing this. In many of the models we fine tune, they are not meant for chat/conversations--instead they are meant to provide a single response to a well-structured prompt. For example, we may be doing batch inference to summarize lots of articles.

While the chat_template solves the chat use case, I notice in the implementation requires a list of dicts representing the chat history, so a template can't just look like this:

Summarize the following article: {{ article }}

You could make messages = [{"article": "some text"}] a single element array and have the template be like this.

Summarize the following article: {{ messages[0].article }}

Basically, single generation use case would just be considered a subset of the chat use case. Is that the recommendation? Another option could be renaming to prompt_template to be more generic and/or making the input more flexible (not just List[Dict] | Conversation).

Rocketknight1 commented 9 months ago

Hi @shimizust, this is a really interesting question! When I was designing the spec, I did realize that people would eventually want to use chat templates for things besides chat. As a result, the prompt format is quite flexible. In fact, I believe you should be able to pass a raw string to apply_chat_template and write a template to support it!

Most templates have a loop like {% for message in messages %} that loops over a list of messages. However, even though the input is always called "messages", I think it would still work if you passed a string, in which case you could probably just write a template like this:

{{ "Summarize the following article: " + messages }}

and then just

tokenizer.apply_chat_template(article)

Your solution of using an article key in the message dicts would also work, and might be safer. Feel free to experiment and let me know if you encounter any difficulties - I think you're the first person we know of that's trying this for a non-chat use case, so we're definitely interested in hearing about your experience!

shimizust commented 9 months ago

@Rocketknight1 Thanks for the response! You're right, you can do something like: tokenizer.apply_chat_template("my_text", chat_template="Here is my text: {{messages}}").

I guess my example was too simple. Usually the prompt would need to be constructed from several features. For example:

Write an article about {{location}} from the perspective of a {{occupation}} in the year {{year}}

And then ideally you just pass a dictionary like this to apply_chat_template():

{
  "location": "Mars",
  "occupation": "farmer",
  "year": 2100
}

vs. currently, the template would need to look like the following, which is a bit unintuitive:

Write an article about {{messages[0].location}} from the perspective of a {{messages[0].occupation}} in the year {{messages[0].year}}

and input being:

[
  {
    "location": "Mars",
    "occupation": "farmer",
    "year": 2100
  }
]

Rocketknight1 commented 9 months ago

Hi @shimizust - although it's not officially supported, I think it would work if you pass a single dict to apply_chat_template. It would still be called 'messages' inside the template, but you could access it with {{messages['location']}} in the template, which might be a little cleaner.

Let me know if you try it!

shimizust commented 9 months ago

@Rocketknight1 Gotcha, yeah passing a dict directly to apply_chat_template works. Thank you

huggingface / transformers