local LLM function calling broken in streaming mode, infinite loop in non-stream mode

Describe the bug

I am trying to use function calling using local LLM. With Ollama I could not find a way to do it yet.

With LocalAI however thay have a full support for function calling which I tested directly with API calls and works as expected.

Using the default readme example does not work and fails with: The get_current_weather call has invalid arguments:

I found out that this is due to the streaming call, the result is returned over multiple calls.

LocalAI side function call streaming API results

3:01PM DBG Sending chunk: {"created":1718031283,"object":"chat.completion.chunk","id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47","model":"llama3-8b-inst","choices":[{"index":0,"finish_reason":"","delta":{"role":"assistant","content":null,"tool_calls":[{"index":0,"id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47","type":"function","function":{"name":"get_current_weather","arguments":""}}]}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

3:01PM DBG Sending chunk: {"created":1718031283,"object":"chat.completion.chunk","id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47","model":"llama3-8b-inst","choices":[{"index":0,"finish_reason":"","delta":{"role":"assistant","content":"","tool_calls":[{"index":0,"id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47","type":"function","function":{"arguments":"{\"location\":\"New York, NY\"}"}}]}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

By disabling streaming, function calling works but enter into an infinite loop (see below).

To Reproduce

aichat -S -m localai:llama3-8b-inst --role %functions% what is the weather in London
Call get_current_weather '{"location":"New York, NY"}'
Call get_current_weather '{"location":"New York, NY"}'
Call get_current_weather '{"location":"New York"}'
Call get_current_weather '{"location":"New York"}'
Call get_current_weather '{"location":"New York"}'
Call get_current_weather '{"location":"London"}'
Call get_current_weather '{"location":"New York, NY"}'
Call get_current_weather '{"location":"New York"}'
Call get_current_weather '{"location":"New York"}'
Call get_current_weather '{"location":"New York"}'
Call get_current_weather '{"location":"London"}'
Call get_current_weather '{"location":"New York"}'
Call get_current_weather '{"location":"New York"}'
Call get_current_weather '{"location":"New York, NY"}'
Call get_current_weather '{"location":"New York"}'
Call get_current_weather '{"location":"London"}'
Call get_current_weather '{"location":"New York"}'
Call get_current_weather '{"location":"London"}'
Call get_current_weather '{"location":"New York"}'

Debug log of a looping function call - non streaming

2024-06-10T15:32:22.443Z [DEBUG] aichat::client::openai_compatible: OpenAICompatible Chat Completions Request: http://localai.srvlan:8080/v1/chat/completions {"model":"llama3-8b-inst","messages":[{"role":"user","content":"what is the weather in London"}],"tools":[{"type":"function","function":{"name":"get_current_weather","description":"Get the current weather in a given location.","parameters":{"type":"object","properties":{"location":{"type":"string","description":"The city and optionally the state or country, e.g., \"London\", \"San Francisco, CA\"."}},"required":["location"]}}}],"tool_choice":"auto"}
2024-06-10T15:32:23.091Z [DEBUG] aichat::client::openai: non-stream-data: {"created":1718031283,"object":"chat.completion","id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47","model":"llama3-8b-inst","choices":[{"index":0,"finish_reason":"tool_calls","message":{"role":"assistant","content":"","tool_calls":[{"index":0,"id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47","type":"function","function":{"name":"get_current_weather","arguments":"{\"location\":\"New York\"}"}}]}}],"usage":{"prompt_tokens":275,"completion_tokens":19,"total_tokens":294}}
2024-06-10T15:32:23.302Z [DEBUG] aichat::client::openai_compatible: OpenAICompatible Chat Completions Request: http://localai.srvlan:8080/v1/chat/completions {"model":"llama3-8b-inst","messages":[{"role":"user","content":"what is the weather in London"},{"role":"assistant","content":"","tool_calls":[{"id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47","type":"function","function":{"name":"get_current_weather","arguments":"{\"location\":\"New York\"}"}}]},{"role":"tool","content":"{\"output\":\"New+York: ☀️   🌡️+19°C 🌬️→4.7m/s\\n\"}","tool_call_id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47"}],"tools":[{"type":"function","function":{"name":"get_current_weather","description":"Get the current weather in a given location.","parameters":{"type":"object","properties":{"location":{"type":"string","description":"The city and optionally the state or country, e.g., \"London\", \"San Francisco, CA\"."}},"required":["location"]}}}],"tool_choice":"auto"}
2024-06-10T15:32:24.004Z [DEBUG] aichat::client::openai: non-stream-data: {"created":1718031283,"object":"chat.completion","id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47","model":"llama3-8b-inst","choices":[{"index":0,"finish_reason":"tool_calls","message":{"role":"assistant","content":"","tool_calls":[{"index":0,"id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47","type":"function","function":{"name":"get_current_weather","arguments":"{\"location\":\"New York\"}"}}]}}],"usage":{"prompt_tokens":275,"completion_tokens":20,"total_tokens":295}}
2024-06-10T15:32:24.226Z [DEBUG] aichat::client::openai_compatible: OpenAICompatible Chat Completions Request: http://localai.srvlan:8080/v1/chat/completions {"model":"llama3-8b-inst","messages":[{"role":"user","content":"what is the weather in London"},{"role":"assistant","content":"","tool_calls":[{"id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47","type":"function","function":{"name":"get_current_weather","arguments":"{\"location\":\"New York\"}"}},{"id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47","type":"function","function":{"name":"get_current_weather","arguments":"{\"location\":\"New York\"}"}}]},{"role":"tool","content":"{\"output\":\"New+York: ☀️   🌡️+19°C 🌬️→4.7m/s\\n\"}","tool_call_id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47"},{"role":"tool","content":"{\"output\":\"New+York: ☀️   🌡️+19°C 🌬️→4.7m/s\\n\"}","tool_call_id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47"}],"tools":[{"type":"function","function":{"name":"get_current_weather","description":"Get the current weather in a given location.","parameters":{"type":"object","properties":{"location":{"type":"string","description":"The city and optionally the state or country, e.g., \"London\", \"San Francisco, CA\"."}},"required":["location"]}}}],"tool_choice":"auto"}
2024-06-10T15:32:24.917Z [DEBUG] aichat::client::openai: non-stream-data: {"created":1718031283,"object":"chat.completion","id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47","model":"llama3-8b-inst","choices":[{"index":0,"finish_reason":"tool_calls","message":{"role":"assistant","content":"","tool_calls":[{"index":0,"id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47","type":"function","function":{"name":"get_current_weather","arguments":"{\"location\":\"New York\"}"}}]}}],"usage":{"prompt_tokens":275,"completion_tokens":19,"total_tokens":294}}
2024-06-10T15:32:25.117Z [DEBUG] aichat::client::openai_compatible: OpenAICompatible Chat Completions Request: http://localai.srvlan:8080/v1/chat/completions {"model":"llama3-8b-inst","messages":[{"role":"user","content":"what is the weather in London"},{"role":"assistant","content":"","tool_calls":[{"id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47","type":"function","function":{"name":"get_current_weather","arguments":"{\"location\":\"New York\"}"}},{"id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47","type":"function","function":{"name":"get_current_weather","arguments":"{\"location\":\"New York\"}"}},{"id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47","type":"function","function":{"name":"get_current_weather","arguments":"{\"location\":\"New York\"}"}}]},{"role":"tool","content":"{\"output\":\"New+York: ☀️   🌡️+19°C 🌬️→4.7m/s\\n\"}","tool_call_id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47"},{"role":"tool","content":"{\"output\":\"New+York: ☀️   🌡️+19°C 🌬️→4.7m/s\\n\"}","tool_call_id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47"},{"role":"tool","content":"{\"output\":\"New+York: ☀️   🌡️+19°C 🌬️→4.7m/s\\n\"}","tool_call_id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47"}],"tools":[{"type":"function","function":{"name":"get_current_weather","description":"Get the current weather in a given location.","parameters":{"type":"object","properties":{"location":{"type":"string","description":"The city and optionally the state or country, e.g., \"London\", \"San Francisco, CA\"."}},"required":["location"]}}}],"tool_choice":"auto"}

expected results

Ollama support for function calling?
Streaming mode function calling
Non-streaming mode should enter an infinite loop

environment

Arch linux
aichat 0.18.0

I will try to track down the infinite loop issue and create a PR when I have some time

The infinite loop is the bug of LLM. AIChat is also responsible for identifying dead loops and aborting.

Let me explain.

Let's first illustrate how a non-bug LLM works without parallel supporting.

# First Request: Question only
[{"role":"user","content":"what is the weather in Pairs and Berlin?"}]

# First Reponse: Returns the first function calling
{
  "content": null,
  "role": "assistant",
  "tool_calls": [
    {
      "function": {
        "arguments": "{\n  \"location\": \"Paris\"\n}",
        "name": "get_current_weather"
      },
      "id": "call_8e7VqAZILfH0CX4uPVW3lebx",
      "type": "function"
    }
  ]
}

# Second Request: Question + The result of first function calling
[
  {
    "role": "user",
    "content": "what is the weather in Pairs and Berlin?"
  },
  {
    "role": "assistant",
    "content": "",
    "tool_calls": [
      {
        "id": "call_8e7VqAZILfH0CX4uPVW3lebx",
        "type": "function",
        "function": {
          "name": "get_current_weather",
          "arguments": "{\n  \"location\": \"Paris\"\n}"
        }
      }
    ]
  },
  {
    "role": "tool",
    "content": "{\"output\":\"Paris: ☀️   🌡️+12°C 🌬️→1.1m/s\\n\"}",
    "tool_call_id": "call_8e7VqAZILfH0CX4uPVW3lebx"
  }
]

# Second Reponse: Return second function calling
{
  "role": "assistant",
  "content": null,
  "tool_calls": [
    {
      "id": "call_clrUp5JmYoUr5WFGmf0t12Q8",
      "type": "function",
      "function": {
        "name": "get_current_weather",
        "arguments": "{\"location\":\"Berlin\"}"
      }
    }
  ]
}

# Third Reqeust: Question + The results of two function callings
[
  {
    "role": "user",
    "content": "what is the weather in Pairs and Berlin?"
  },
  {
    "role": "assistant",
    "content": "",
    "tool_calls": [
      {
        "id": "call_8e7VqAZILfH0CX4uPVW3lebx",
        "type": "function",
        "function": {
          "name": "get_current_weather",
          "arguments": "{\n  \"location\": \"Paris\"\n}"
        }
      },
      {
        "id": "call_clrUp5JmYoUr5WFGmf0t12Q8",
        "type": "function",
        "function": {
          "name": "get_current_weather",
          "arguments": "{\"location\":\"Berlin\"}"
        }
      }
    ]
  },
  {
    "role": "tool",
    "content": "{\"output\":\"Paris: ☀️   🌡️+12°C 🌬️→1.1m/s\\n\"}",
    "tool_call_id": "call_8e7VqAZILfH0CX4uPVW3lebx"
  },
  {
    "role": "tool",
    "content": "{\"output\":\"Berlin: ☁️   🌡️+11°C 🌬️→7.2m/s\\n\"}",
    "tool_call_id": "call_clrUp5JmYoUr5WFGmf0t12Q8"
  }
]

# Third Response: Final Result
{
  "role": "assistant",
  "content": "The current weather in Paris is sunny with a temperature of +12°C and a wind speed of 1.1 m/s.\n\nIn Berlin, the current weather is cloudy with a temperature of +11°C and a wind speed of 7.2 m/s."
}

AIChat checks whether the response contains a function call. If so, it executes the function and sends the execution result together with the previous function call results to LLM for further processing until there is no function call.

The bug LLMs cannot handle function calling correctly, resulting in responses always containing function calls, eventually leading to an infinite loop.

sigoden / aichat

local LLM function calling broken in streaming mode, infinite loop in non-stream mode #580

585 will fix the bug.