theroyallab / tabbyAPI

An OAI compatible exllamav2 API that's both lightweight and fast
GNU Affero General Public License v3.0
601 stars 75 forks source link

[BUG] Tool Calling not working for Llama 3.2 3B #234

Open raisbecka opened 1 week ago

raisbecka commented 1 week ago

OS

Windows

GPU Library

CUDA 12.x

Python version

3.11

Describe the bug

I am using WSL2 docker with CUDA. Regular text generation runs really well and as expected, but tool calling doesn't work even when the exact name of the tool is in the prompt. Using default prompt template that comes with Llama 3.2 exl2 (https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-exl2).

Here's the thing: Using Ollama on Windows, if I include a tool list with the prompt, it works correctly 95% of the time. For regular chat, I simply omit the tool list when I send the request - otherwise this smaller model tends to hallucinate. But this workflow works very consistently for me. This is with the same default prompt template.

I would prefer to continue using the faster TabbyAPI + ExLlamav2 server, but I need a reliable way to call tools. I know in advance when a prompt should result in at least one tool call, so how can I use this to my advantage (like I do with Ollama)?

Reproduction steps

Expected behavior

I should be able to trigger a tool call when I request one and supply a tools param with matching tool names. I should also be able to switch back to normal dialog by supplying an empty tool list.

Logs

Below is a log output of the request being sent by openai module:

2024-11-11 11:21:19,471 - DEBUG - Request options: {'method': 'post', 'url': '/chat/completions', 'files': None, 'json_data': {'messages': [{'role': 'system', 'content': 'You are a helpful AI assistant. \n\t\t\tWhen the user wants to finish a conversation, \n\t\t\tyou will always respond with "Goodbye!". \n\t\t\tNever use this word unless the user wants to end the conversation. If the user\n\t\t\trequests to end the coversation, you must say it. Use the best available tool to fullfill this request.'}, {'role': 'user', 'content': 'Create a log entry for a Georgina water treatment plant with Stephen Beatty as the OIC for a broken pump on April 1, 2024'}], 'model': 'Llama-3.2-3B-Instruct-exl2', 'temperature': 0, 'tools': [{'type': 'function', 'function': {'name': 'create_log_entry', 'description': 'Create a new log entry', 'parameters': {'type': 'object', 'properties': {'Logbook Title': {'type': 'string', 'description': 'The title of the logbook.'}, 'Event Date': {'type': 'datetime', 'description': 'The date/time the logged event took place.'}, 'OIC First Name': {'type': 'string', 'description': 'The last name of the person assigned as OIC.'}, 'OIC Last Name': {'type': 'string', 'description': 'The last name of the person assigned as OIC.'}, 'Details': {'type': 'string', 'description': 'The details of the log entry.'}}, 'required': ['Logbook Title', 'Event Date', 'Details'], 'additionalProperties': False}}}]}}

2024-11-11 11:21:19,472 - DEBUG - connect_tcp.started host='localhost' port=5000 local_address=None timeout=5.0 socket_options=None

Additional context

Otherwise this is working really well. Love it. Just having the issue with calling tools!

Acknowledgements

raisbecka commented 1 week ago

To add further material to this:

I get the below response when prompting TabbyAPI with tools and a tool use prompt:

TabbyAPI Response: { "id": "chatcmpl-31cb9af0f64346faab07106a098a93fb", "choices": [ { "index": 0, "finish_reason": "stop", "stop_str": "<|tool_start|>", "message": { "role": "assistant", "content": "", "tool_calls": [] }, "logprobs": null } ], "created": 1731350901, "model": "Llama-3.2-3B-Instruct-exl2", "object": "chat.completion", "usage": { "prompt_tokens": 575, "completion_tokens": 1, "total_tokens": 576 } }

Finish reason and stop str don't necessarily look correct to me - not sure. It LOOKS like it's stopping the reply right as it starts to formulate the tool call? I could be wrong...

raisbecka commented 1 week ago

One last thing. Indeed - I just had Cline (Claude Dev) replace my backend with Ollama instead of ExLlamav2 and TabbyAPI - my function calls are working again. Almost 100% accurate, so this is not a config or model issue - it seems to be something else.