Closed madroidmaq closed 2 months ago
I tried this after installing your other PR with function calling support. For example, Qwen does consequently reply with the tool/function call and stops: "
Shouldn't the "finish_reason" be "tool_calls" so that it's OpenAI-compatible, as per https://platform.openai.com/docs/guides/function-calling ? I think that's how apps like TypingMind and other front ends know there's a tool to call.
I tried this after installing your other PR with function calling support. For example, Qwen does consequently reply with the tool/function call and stops: "
{"name": "get_web_search_results", "arguments": {"keyword": "OpenAI function calling format"}} ".Shouldn't the "finish_reason" be "tool_calls" so that it's OpenAI-compatible, as per https://platform.openai.com/docs/guides/function-calling ? I think that's how apps like TypingMind and other front ends know there's a tool to call.
You are right, "finish_reason" should be "tool_calls." As mentioned in the link above, the function results are still manually parsed, meaning there is currently no good way to determine if the end reason is tool_calls. This PR is not a complete feature implementation, and perhaps when the tokenizer supports a similar decode_function_call function, we can fully support the function calling feature.
You are right, "finish_reason" should be "tool_calls."
Does it make sense to do something like:
tools
in the requesttool_calls
?You are right, "finish_reason" should be "tool_calls."
Does it make sense to do something like:
- Check if
tools
in the request- If it is and the stop condition is met, set the finish_reason to
tool_calls
?
The situation may not be that simple, because including the parameter tools does not necessarily call a function. For example, the following request that includes tools returns a plain text description.
The approach I currently recommend is:
Of course, as mentioned above, this submission does not fully support the Function Calling feature. Full support requires relying on the Huggingface transformers library to provide similar functions.
@awni , @madroidmaq ,
How can we check if "finish_reason" is "stop" on a MLXLLM response using Llama 3.2?
I am testing Llama 3.2 tool calls using: https://github.com/mainframecomputer/fullmoon-ios
I did a first test with the prompt mentioned on the example: https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_2/
And MLXLLM response using Llama 3.2 is this, but I don't see any "finish_reason" is "stop"
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
Questions: Can you retrieve the details for the user with the ID 7890, who has black as their special request?
Here is a list of functions in JSON format that you can invoke:
[
{
"name": "get_user_info",
"description": "Retrieve details for a specific user by their unique identifier. Note that the provided function is in Python 3 syntax.",
"parameters": {
"type": "dict",
"required": [
"user_id"
],
"properties": {
"user_id": {
"type": "integer",
"description": "The unique identifier of the user. It is used to fetch the specific user details from the database."
},
"special": {
"type": "string",
"description": "Any special information or parameters that need to be considered while fetching user details.",
"default": "none"
}
}
}
}
]
Should you decide to return the function call(s), Put it in the format of [func1(params_name=params_value, params_name2=params_value2...), func2(params)]
NO other text MUST be included.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Fix #784, Related PR, #995 .
Some examples of model requests:
meta-llama/Llama-3.2-1B-Instruct
Qwen/Qwen2.5-0.5B-Instruct
mistralai/Mistral-7B-Instruct-v0.3
View the meta-llama/Llama-3.2-1B-Instruct cURL & response
#### meta-llama/Llama-3.2-1B-Instruct - curl ```shell curl http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "meta-llama/Llama-3.2-1B-Instruct", "messages": [ { "role": "user", "content": "What is the weather in San Francisco?" } ], "tools": [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and country, eg. San Francisco, USA" }, "format": { "type": "string", "enum": ["celsius", "fahrenheit"] } }, "required": ["location", "format"] } } } ] }' ``` - response ```json { "id": "chatcmpl-ce03f09d-fdaf-4a77-b421-146cef104968", "system_fingerprint": "fp_0f3acf0e-bc4e-45bc-b2a9-f0d5e6e500ea", "object": "chat.completions", "model": "meta-llama/Llama-3.2-1B-Instruct", "created": 1727596395, "choices": [{ "index": 0, "logprobs": { "token_logprobs": [-0.125, 0.0, -0.625, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -1.25, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], "top_logprobs": [], "tokens": [128010, 5018, 1337, 794, 330, 1723, 498, 330, 1723, 794, 330, 456, 11327, 70464, 498, 330, 14105, 794, 5324, 2588, 794, 330, 24661, 13175, 11, 7427, 498, 330, 2293, 794, 330, 66, 41347, 32075, 128008, 128006, 78191, 128007, 271, 128010, 5018, 1337, 794, 330, 1723, 498, 330, 1723, 794, 330, 456, 11327, 70464, 498, 330, 14105, 794, 5324, 2588, 794, 330, 24661, 13175, 11, 7427, 498, 330, 2293, 794, 330, 69, 49010, 32075, 128008, 128006, 78191, 128007, 271, 128010, 5018, 1337, 794, 330, 1723, 498, 330, 1723, 794, 330, 456, 11327, 70464, 498, 330, 14105, 794, 5324, 2588, 794, 330] }, "finish_reason": "length", "message": { "role": "assistant", "content": "<|python_tag|>{\"type\": \"function\", \"function\": \"get_current_weather\", \"parameters\": {\"location\": \"San Francisco, USA\", \"format\": \"celsius\"}}<|eom_id|><|start_header_id|>assistant<|end_header_id|>\n\n<|python_tag|>{\"type\": \"function\", \"function\": \"get_current_weather\", \"parameters\": {\"location\": \"San Francisco, USA\", \"format\": \"fahrenheit\"}}<|eom_id|><|start_header_id|>assistant<|end_header_id|>\n\n<|python_tag|>{\"type\": \"function\", \"function\": \"get_current_weather\", \"parameters\": {\"location\": \"" } }], "usage": { "prompt_tokens": 232, "completion_tokens": 100, "total_tokens": 332 } } ```View the Qwen/Qwen2.5-0.5B-Instruct cURL & response
#### Qwen/Qwen2.5-0.5B-Instruct curl: ```shell curl http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "Qwen/Qwen2.5-0.5B-Instruct", "messages": [ { "role": "user", "content": "What is the weather in San Francisco?" } ], "tools": [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and country, eg. San Francisco, USA" }, "format": { "type": "string", "enum": ["celsius", "fahrenheit"] } }, "required": ["location", "format"] } } } ] }' ``` --- ```json { "id": "chatcmpl-a704f0ce-16dd-45ef-8620-b7dce4b6e8ed", "system_fingerprint": "fp_2e665298-ed11-4424-b23f-812fda71d9d4", "object": "chat.completions", "model": "Qwen/Qwen2.5-0.5B-Instruct", "created": 1727594741, "choices": [ { "index": 0, "finish_reason": "stop", "message": { "role": "assistant", "content": "View the mistralai/Mistral-7B-Instruct-v0.3 cURL & response
#### mistralai/Mistral-7B-Instruct-v0.3 - curl ```shell curl http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "mistralai/Mistral-7B-Instruct-v0.3", "messages": [ { "role": "user", "content": "What is the weather in San Francisco?" } ], "tools": [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and country, eg. San Francisco, USA" }, "format": { "type": "string", "enum": ["celsius", "fahrenheit"] } }, "required": ["location", "format"] } } } ] }' ``` - response ```json { "id": "chatcmpl-c724d955-63d3-404d-8af5-e775dfecc27e", "system_fingerprint": "fp_bcee6a65-1462-44ec-9272-275e3333063c", "object": "chat.completions", "model": "mistralai/Mistral-7B-Instruct-v0.3", "created": 1727597917, "choices": [{ "index": 0, "logprobs": { "token_logprobs": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.625], "top_logprobs": [], "tokens": [5, 1501, 7567, 1629, 2032, 1113, 1295, 29498, 3790, 29498, 1537, 1991, 1316, 1113, 17452, 2032, 10598, 3501, 2032, 1113, 18672, 10454, 29493, 7803, 1316, 1113, 4530, 2032, 1113, 29485, 1958, 3938, 29507, 1743, 29561, 2] }, "finish_reason": "stop", "message": { "role": "assistant", "content": "[TOOL_CALLS] [{\"name\": \"get_current_weather\", \"arguments\": {\"location\": \"San Francisco, USA\", \"format\": \"celsius\"}}]" } }], "usage": { "prompt_tokens": 114, "completion_tokens": 36, "total_tokens": 150 } } ```Limitation
Currently, chat_template is used to encode and process input, but there is no unified method for handling output yet. Therefore, manual parsing is required for each model's output. For details, see: Tool Use, Unified .