Open pseudotensor opened 4 months ago
this is purely a chat template thing; i have it implemented on my models with a custom-written chat template. for example, the following is a llama3 template for supporting assistant response prefill:
{%- set loop_messages = messages %}
{%- for message in loop_messages %}
{%- set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim %}
{%- if loop.index0 == 0 %}
{%- set content = bos_token + content %}
{%- endif %}
{%- if not (loop.last and message['role'] == 'assistant') %}
{%- set content = content + '<|eot_id|>' %}
{%- endif %}
{{- content }}
{%- endfor %}
{%- if messages[-1]['role'] != 'assistant' %}
{{- '<|start_header_id|>assistant<|end_header_id|>\n\n' }}
{%- endif %}
this is purely a chat template thing; i have it implemented on my models with a custom-written chat template. for example, the following is a llama3 template for supporting assistant response prefill:
{%- set loop_messages = messages %} {%- for message in loop_messages %} {%- set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim %} {%- if loop.index0 == 0 %} {%- set content = bos_token + content %} {%- endif %} {%- if not (loop.last and message['role'] == 'assistant') %} {%- set content = content + '<|eot_id|>' %} {%- endif %} {{- content }} {%- endfor %} {%- if messages[-1]['role'] != 'assistant' %} {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' }} {%- endif %}
It works! Thanks
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
Ideally, this should be supported out of the box; or at least a note in the documentation about how to use a template to enable the behavior. It's a pretty common technique (both Anthropic and Mistral explicitly document supporting it).
You might find the continue_final_message
extra call argument helpful - docs
Example:
curl -X POST "http://<vllm-address>/v1/chat/completions" -H "Content-Type: application/json" -d '
{
"model": "<model name>",
"messages": [
{"role": "user", "content": "Hello there!"},
{"role": "assistant", "content": "Hi! My name is"}
],
"add_generation_prompt": false,
"continue_final_message": true
}'
Make sure you're running the latest vllm server version
🚀 The feature, motivation and pitch
https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/prefill-claudes-response https://www.anthropic.com/news/claude-2-1-prompting
I expected I could prefill the assistant response, but seems like it doesn't work.
I should be able to do:
And it should use this to generate up through assistant's response but not ending it, so the model continues
Anthropic has this feature, and it helps to control the responses.
Alternatives
Yes, one can avoid the chat API, but since the chat template work is so pervasive and useful, it would be great to add this extension.
Additional context
I'm unclear if it's even possible within the general chat framework. Maybe the jinja2 template would support it OOTB or not, I'm not sure how much it depends upon the template writer, but even if one had to tweak an existing template, still would require the chat API to handle.