meta-llama / llama-stack-apps

Agentic components of the Llama Stack APIs
MIT License
3.25k stars 323 forks source link

Clarification on the system prompt for custom tool use #36

Open ricklamers opened 1 month ago

ricklamers commented 1 month ago

Awesome work! Just a quick question about the correct system prompt:

in the docs https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1#user-defined-custom-tool-calling this is used:

If a you choose to call a function ONLY reply in the following format:
<{start_tag}={function_name}>{parameters}{end_tag}
where

start_tag => `<function`
parameters => a JSON dict with the function argument name as key and function argument value as value.
end_tag => `</function>`

Here is an example,
<function=example_function_name>{"example_name": "example_value"}</function>

Reminder:
- Function calls MUST follow the specified format
- Required parameters MUST be specified
- Only call one function at a time
- Put the entire function call reply on one line"
- Always add your sources when using search results to answer the user query

You are a helpful Assistant.

While in the repo this is used:

Think very carefully before calling functions.
If you choose to call a function ONLY reply in the following format with no prefix or suffix:

<function=example_function_name>{{"example_name": "example_value"}}</function>

Reminder:
- If looking for real time information use relevant functions before falling back to brave_search
- Function calls MUST follow the specified format, start with <function= and end with </function>
- Required parameters MUST be specified
- Only call one function at a time
- Put the entire function call reply on one line

Furthermore, could you clarify if the "Only call one function at a time" implies parallel tool use is not intended to be used for these instruction tuned models (Llama 3.1 family)?

e.g. "Please get the weather for San Francisco and Tokyo" can't generate:

<|start_header_id|>assistant<|end_header_id|>

<function=get_weather>{"location": "San Francisco"}</function>
<function=get_weather>{"location": "Tokyo"}</function><|eot_id|>

Thanks for clarifying!

Rick Lamers AI Researcher at Groq

HamidShojanazeri commented 1 month ago

cc: @ashwinb

ashwinb commented 1 month ago

@ricklamers thanks for pointing out the discrepancy. Please use the version as specified in the code / this repo. We will update our documentation to match the version from the code.

Re: parallel tool calling, we are doing a couple quick experiments and will get back to you on that ASAP.

ricklamers commented 1 month ago

Awesome, thanks!

ricklamers commented 1 month ago

@ashwinb FYI in HF's chat template yet another prompt is used: https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct/blob/8c22764a7e3675c50d4c7c9a4edb474456022b16/tokenizer_config.json#L2053

Is that wrong? Should it follow the one in this repo?

ashwinb commented 1 month ago

@ricklamers :( not happy with these inconsistencies. it is hard to say something is wrong given the general stochasticity with tool calling unfortunately.

all I will say is that this is the reason we put llama model template --name <...> as part of the llama CLI. so that's the definitive source our researchers generally recommend. Given rapid iteration times, sometimes these recommendations don't reach across all the folks that need to see it.

ricklamers commented 1 month ago

No worries, as long as we know the correct system prompt (this repo) we can all adjust to converge to the same correct version. Any updates on parallel calls?

ricklamers commented 1 month ago

I've put out a note for them https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct/discussions/90

Rocketknight1 commented 1 month ago

Hey, Matt from Hugging Face here. Just to clarify, the HF template was written following the "JSON based tool calling" template in this doc, and the prompt we used was also copied from the example prompt there.

Based on this discussion, am I correct that the <function> format in this repo is preferred, and we shouldn't use JSON tool calling? If so, I should rewrite the whole template to use that instead, rather than just updating the system prompt.

Imbernoulli commented 1 month ago

So which template is preferred? The function one or the json one? They are both at https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1/

hardikjshah commented 1 month ago

Both the json version and function version work reasonably well. We observed that the json one tends to over steer to using tools even when one is not asked for while with <function we were able to control that a bit more. Json version had higher recall but high false positives while <function had lower recall with higher precision. So tbh its a bit use case specific and I'd suggest you try both and see which works best for you.

Rocketknight1 commented 1 month ago

Unfortunately, we kind of have to pick one for the template! One thing we noticed is that with the current JSON template, 8B makes tool calls correctly, but sometimes fails to use the results correctly in chat - not sure if this is an issue with the system message we used, since it was all copied from the doc.

My suspicion is that an alternate prompt would fix a lot of this, and we'd prefer to have a clear answer on the best way to do things rather than several options!

hardikjshah commented 1 month ago

We updated the default tool format to be json based and recommend following that.

https://github.com/meta-llama/llama-agentic-system/pull/45 https://github.com/meta-llama/llama-stack/pull/29 https://github.com/meta-llama/llama-models/pull/110

The code also supports the <function> format and can be extended to support other formats in the future if needed. Working with other teams to reconcile and update the website to reflect these changes.

Use this command to get the latest recommended format,

llama model template --name system-custom-tools-only

image

Some caveats,

Hope this helps resolve the confusion. Again, thanks for raising these issues, it helps us get better and improve with each version.

Watebear commented 1 month ago

Hello, I attempted to use the prompt concatenation method mentioned above to test BFCL, but the AST SUMMARY only achieved 50.82%. Below is an example of the input I constructed. example Could you provide an input format that can reproduce the evaluation results from the report? Thanks!

el-hash-1 commented 4 weeks ago

@hardikjshah I think the llama model template --name assistant-custom-tool-call would also need to be updated to json format.