meta-llama / llama-recipes

Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama3 for WhatsApp & Messenger.
10.38k stars 1.47k forks source link

Llama 3 Preferred RAG Prompting Format (xml tags vs. markdown vs. something else) #450

Open krumeto opened 2 months ago

krumeto commented 2 months ago

🚀 The feature, motivation and pitch

Anthropic directly states that their models prefer context for longer prompts (like the usual RAG applications) to be inserted in XML tags. Some claim OpenAI's models prefer markdown-style (their docs mention both markdown and XML tags).

Does Llama 3 have a preferred format for longer prompts?

Thank you in advance!

Alternatives

No response

Additional context

No response

jeffxtang commented 2 months ago

There's no mention of a preferred format for Llama 3. According to the Llama 3 model card prompt format, you just need to follow the new Llama 3 format there (also specified in HF's blog here), but if you use a framework LangChain or service provider like Groq/Replicate or run Llama 3 locally using Ollama for your RAG apps, most likely you won't need to deal with the new prompt format directly as it's been hardcoded by them under the hood. Just use an appropriate RAG prompt (e.g. rag-prompt) with your question, context and possibly chat history for Llama 3 to answer.

krumeto commented 2 months ago

Thank you, @jeffxtang! I am aware of the new prompt format.

I was asking more about model preferences regarding RAG type of prompts and longer input prompts.

Example 1 - XML tags (aka the way Anthropic recommends their models to be prompted):

Please analyze this document and write a detailed summmary memo according to the instructions below, following the format given in the example:
<document>
{{DOCUMENT}}
</document>

<instructions>
{{DETAILED_INSTRUCTIONS}}
</instructions>

<example>
{{EXAMPLE}}
</example>

Example 2 (formatted as markdown):

Please analyze this document and write a detailed summmary memo according to the instructions below, following the format given in the example:

## Document
{{DOCUMENT}}

## Instructions
{{DETAILED_INSTRUCTIONS}}

## Examples
{{EXAMPLE}}

Example 3 (special tokens, like for example - https://huggingface.co/jondurbin/airoboros-l2-c70b-3.1.2):

BEGININPUT
Please analyze this document and write a detailed summmary memo according to the instructions below, following the format given in the example:

BEGINCONTEXT
{{DOCUMENT}}
ENDCONTEXT

BEGINEXAMPLES
{{DOCUMENT}}
ENDEXAMPLES
ENDINPUT

BEGININSTRUCTION
{{DETAILED_INSTRUCTIONS}}
ENDINSTRUCTION

Is there a format that Llama-3 Instruct models prefer?

jeffxtang commented 2 months ago

I'm not aware of such preference for Llama 3, but it should be easy, with some automated RAG evaluation frameworks (there're quite a few nice open source frameworks), to compare the results of the example with different formats and see if there's any quality difference. @krumeto

trivikramak commented 2 months ago

Hi @krumeto, were you able to find out what works best with Llama-3 ?

krumeto commented 2 months ago

Hey @trivikramak, no, not yet (apologies).

scottstirling commented 1 month ago

Ask it and see what it says. Try some different stuff. Interesting question.

mindful-time commented 1 week ago

I using llama3:instruct and llama3:latest model from Ollama for this and seems like providing system prompt with chat history and context just doesn't work out of the box. I am getting replies from the both where it just keep giving me the entire system prompt as the answer when just asking Hi or any other question. Seems like it's not working like you expect with OpenAI , also using langchain's chatollama class for it. So this will definitely help.