[BUG] Tool Calling not working for Llama 3.2 3B

raisbecka commented 1 week ago

OS

Windows

GPU Library

CUDA 12.x

Python version

3.11

Describe the bug

I am using WSL2 docker with CUDA. Regular text generation runs really well and as expected, but tool calling doesn't work even when the exact name of the tool is in the prompt. Using default prompt template that comes with Llama 3.2 exl2 (https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-exl2).

Here's the thing: Using Ollama on Windows, if I include a tool list with the prompt, it works correctly 95% of the time. For regular chat, I simply omit the tool list when I send the request - otherwise this smaller model tends to hallucinate. But this workflow works very consistently for me. This is with the same default prompt template.

I would prefer to continue using the faster TabbyAPI + ExLlamav2 server, but I need a reliable way to call tools. I know in advance when a prompt should result in at least one tool call, so how can I use this to my advantage (like I do with Ollama)?

Reproduction steps

Use model: https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-exl2/ (the 3_5 branch).
Use default prompt template included with model
Use openai module in Python to connect
Try regular chat messages with no tools parameter supplied. They work.
Try tool prompts with the tools param set to a list of properly defined tools. Doesn't work.

Expected behavior

I should be able to trigger a tool call when I request one and supply a tools param with matching tool names. I should also be able to switch back to normal dialog by supplying an empty tool list.

Logs

Below is a log output of the request being sent by openai module:

2024-11-11 11:21:19,471 - DEBUG - Request options: {'method': 'post', 'url': '/chat/completions', 'files': None, 'json_data': {'messages': [{'role': 'system', 'content': 'You are a helpful AI assistant. \n\t\t\tWhen the user wants to finish a conversation, \n\t\t\tyou will always respond with "Goodbye!". \n\t\t\tNever use this word unless the user wants to end the conversation. If the user\n\t\t\trequests to end the coversation, you must say it. Use the best available tool to fullfill this request.'}, {'role': 'user', 'content': 'Create a log entry for a Georgina water treatment plant with Stephen Beatty as the OIC for a broken pump on April 1, 2024'}], 'model': 'Llama-3.2-3B-Instruct-exl2', 'temperature': 0, 'tools': [{'type': 'function', 'function': {'name': 'create_log_entry', 'description': 'Create a new log entry', 'parameters': {'type': 'object', 'properties': {'Logbook Title': {'type': 'string', 'description': 'The title of the logbook.'}, 'Event Date': {'type': 'datetime', 'description': 'The date/time the logged event took place.'}, 'OIC First Name': {'type': 'string', 'description': 'The last name of the person assigned as OIC.'}, 'OIC Last Name': {'type': 'string', 'description': 'The last name of the person assigned as OIC.'}, 'Details': {'type': 'string', 'description': 'The details of the log entry.'}}, 'required': ['Logbook Title', 'Event Date', 'Details'], 'additionalProperties': False}}}]}}

2024-11-11 11:21:19,472 - DEBUG - connect_tcp.started host='localhost' port=5000 local_address=None timeout=5.0 socket_options=None

Additional context

Otherwise this is working really well. Love it. Just having the issue with calling tools!

Acknowledgements

[X] I have looked for similar issues before submitting this one.
[X] I have read the disclaimer, and this issue is related to a code bug. If I have a question, I will use the Discord server.
[X] I understand that the developers have lives and my issue will be answered when possible.
[X] I understand the developers of this program are human, and I will ask my questions politely.

raisbecka commented 1 week ago

To add further material to this:

I get the below response when prompting TabbyAPI with tools and a tool use prompt:

TabbyAPI Response: { "id": "chatcmpl-31cb9af0f64346faab07106a098a93fb", "choices": [ { "index": 0, "finish_reason": "stop", "stop_str": "<|tool_start|>", "message": { "role": "assistant", "content": "", "tool_calls": [] }, "logprobs": null } ], "created": 1731350901, "model": "Llama-3.2-3B-Instruct-exl2", "object": "chat.completion", "usage": { "prompt_tokens": 575, "completion_tokens": 1, "total_tokens": 576 } }

Finish reason and stop str don't necessarily look correct to me - not sure. It LOOKS like it's stopping the reply right as it starts to formulate the tool call? I could be wrong...

raisbecka commented 1 week ago

One last thing. Indeed - I just had Cline (Claude Dev) replace my backend with Ollama instead of ExLlamav2 and TabbyAPI - my function calls are working again. Almost 100% accurate, so this is not a config or model issue - it seems to be something else.

theroyallab / tabbyAPI