sigoden / aichat

All-in-one AI CLI tool featuring Chat-REPL, Shell Assistant, RAG, AI tools & agents, with access to OpenAI, Claude, Gemini, Ollama, Groq, and more.
Apache License 2.0
4k stars 269 forks source link

local LLM function calling broken in streaming mode, infinite loop in non-stream mode #580

Closed blob42 closed 3 months ago

blob42 commented 3 months ago

Describe the bug

I am trying to use function calling using local LLM. With Ollama I could not find a way to do it yet.

With LocalAI however thay have a full support for function calling which I tested directly with API calls and works as expected.

Using the default readme example does not work and fails with: The get_current_weather call has invalid arguments:

I found out that this is due to the streaming call, the result is returned over multiple calls.

LocalAI side function call streaming API results

3:01PM DBG Sending chunk: {"created":1718031283,"object":"chat.completion.chunk","id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47","model":"llama3-8b-inst","choices":[{"index":0,"finish_reason":"","delta":{"role":"assistant","content":null,"tool_calls":[{"index":0,"id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47","type":"function","function":{"name":"get_current_weather","arguments":""}}]}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

3:01PM DBG Sending chunk: {"created":1718031283,"object":"chat.completion.chunk","id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47","model":"llama3-8b-inst","choices":[{"index":0,"finish_reason":"","delta":{"role":"assistant","content":"","tool_calls":[{"index":0,"id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47","type":"function","function":{"arguments":"{\"location\":\"New York, NY\"}"}}]}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

By disabling streaming, function calling works but enter into an infinite loop (see below).

To Reproduce

aichat -S -m localai:llama3-8b-inst --role %functions% what is the weather in London
Call get_current_weather '{"location":"New York, NY"}'
Call get_current_weather '{"location":"New York, NY"}'
Call get_current_weather '{"location":"New York"}'
Call get_current_weather '{"location":"New York"}'
Call get_current_weather '{"location":"New York"}'
Call get_current_weather '{"location":"London"}'
Call get_current_weather '{"location":"New York, NY"}'
Call get_current_weather '{"location":"New York"}'
Call get_current_weather '{"location":"New York"}'
Call get_current_weather '{"location":"New York"}'
Call get_current_weather '{"location":"London"}'
Call get_current_weather '{"location":"New York"}'
Call get_current_weather '{"location":"New York"}'
Call get_current_weather '{"location":"New York, NY"}'
Call get_current_weather '{"location":"New York"}'
Call get_current_weather '{"location":"London"}'
Call get_current_weather '{"location":"New York"}'
Call get_current_weather '{"location":"London"}'
Call get_current_weather '{"location":"New York"}'

Debug log of a looping function call - non streaming

2024-06-10T15:32:22.443Z [DEBUG] aichat::client::openai_compatible: OpenAICompatible Chat Completions Request: http://localai.srvlan:8080/v1/chat/completions {"model":"llama3-8b-inst","messages":[{"role":"user","content":"what is the weather in London"}],"tools":[{"type":"function","function":{"name":"get_current_weather","description":"Get the current weather in a given location.","parameters":{"type":"object","properties":{"location":{"type":"string","description":"The city and optionally the state or country, e.g., \"London\", \"San Francisco, CA\"."}},"required":["location"]}}}],"tool_choice":"auto"}
2024-06-10T15:32:23.091Z [DEBUG] aichat::client::openai: non-stream-data: {"created":1718031283,"object":"chat.completion","id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47","model":"llama3-8b-inst","choices":[{"index":0,"finish_reason":"tool_calls","message":{"role":"assistant","content":"","tool_calls":[{"index":0,"id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47","type":"function","function":{"name":"get_current_weather","arguments":"{\"location\":\"New York\"}"}}]}}],"usage":{"prompt_tokens":275,"completion_tokens":19,"total_tokens":294}}
2024-06-10T15:32:23.302Z [DEBUG] aichat::client::openai_compatible: OpenAICompatible Chat Completions Request: http://localai.srvlan:8080/v1/chat/completions {"model":"llama3-8b-inst","messages":[{"role":"user","content":"what is the weather in London"},{"role":"assistant","content":"","tool_calls":[{"id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47","type":"function","function":{"name":"get_current_weather","arguments":"{\"location\":\"New York\"}"}}]},{"role":"tool","content":"{\"output\":\"New+York: ☀️   🌡️+19°C 🌬️→4.7m/s\\n\"}","tool_call_id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47"}],"tools":[{"type":"function","function":{"name":"get_current_weather","description":"Get the current weather in a given location.","parameters":{"type":"object","properties":{"location":{"type":"string","description":"The city and optionally the state or country, e.g., \"London\", \"San Francisco, CA\"."}},"required":["location"]}}}],"tool_choice":"auto"}
2024-06-10T15:32:24.004Z [DEBUG] aichat::client::openai: non-stream-data: {"created":1718031283,"object":"chat.completion","id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47","model":"llama3-8b-inst","choices":[{"index":0,"finish_reason":"tool_calls","message":{"role":"assistant","content":"","tool_calls":[{"index":0,"id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47","type":"function","function":{"name":"get_current_weather","arguments":"{\"location\":\"New York\"}"}}]}}],"usage":{"prompt_tokens":275,"completion_tokens":20,"total_tokens":295}}
2024-06-10T15:32:24.226Z [DEBUG] aichat::client::openai_compatible: OpenAICompatible Chat Completions Request: http://localai.srvlan:8080/v1/chat/completions {"model":"llama3-8b-inst","messages":[{"role":"user","content":"what is the weather in London"},{"role":"assistant","content":"","tool_calls":[{"id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47","type":"function","function":{"name":"get_current_weather","arguments":"{\"location\":\"New York\"}"}},{"id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47","type":"function","function":{"name":"get_current_weather","arguments":"{\"location\":\"New York\"}"}}]},{"role":"tool","content":"{\"output\":\"New+York: ☀️   🌡️+19°C 🌬️→4.7m/s\\n\"}","tool_call_id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47"},{"role":"tool","content":"{\"output\":\"New+York: ☀️   🌡️+19°C 🌬️→4.7m/s\\n\"}","tool_call_id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47"}],"tools":[{"type":"function","function":{"name":"get_current_weather","description":"Get the current weather in a given location.","parameters":{"type":"object","properties":{"location":{"type":"string","description":"The city and optionally the state or country, e.g., \"London\", \"San Francisco, CA\"."}},"required":["location"]}}}],"tool_choice":"auto"}
2024-06-10T15:32:24.917Z [DEBUG] aichat::client::openai: non-stream-data: {"created":1718031283,"object":"chat.completion","id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47","model":"llama3-8b-inst","choices":[{"index":0,"finish_reason":"tool_calls","message":{"role":"assistant","content":"","tool_calls":[{"index":0,"id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47","type":"function","function":{"name":"get_current_weather","arguments":"{\"location\":\"New York\"}"}}]}}],"usage":{"prompt_tokens":275,"completion_tokens":19,"total_tokens":294}}
2024-06-10T15:32:25.117Z [DEBUG] aichat::client::openai_compatible: OpenAICompatible Chat Completions Request: http://localai.srvlan:8080/v1/chat/completions {"model":"llama3-8b-inst","messages":[{"role":"user","content":"what is the weather in London"},{"role":"assistant","content":"","tool_calls":[{"id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47","type":"function","function":{"name":"get_current_weather","arguments":"{\"location\":\"New York\"}"}},{"id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47","type":"function","function":{"name":"get_current_weather","arguments":"{\"location\":\"New York\"}"}},{"id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47","type":"function","function":{"name":"get_current_weather","arguments":"{\"location\":\"New York\"}"}}]},{"role":"tool","content":"{\"output\":\"New+York: ☀️   🌡️+19°C 🌬️→4.7m/s\\n\"}","tool_call_id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47"},{"role":"tool","content":"{\"output\":\"New+York: ☀️   🌡️+19°C 🌬️→4.7m/s\\n\"}","tool_call_id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47"},{"role":"tool","content":"{\"output\":\"New+York: ☀️   🌡️+19°C 🌬️→4.7m/s\\n\"}","tool_call_id":"f5f63d03-9bd7-42ba-9e25-bfc35ed44f47"}],"tools":[{"type":"function","function":{"name":"get_current_weather","description":"Get the current weather in a given location.","parameters":{"type":"object","properties":{"location":{"type":"string","description":"The city and optionally the state or country, e.g., \"London\", \"San Francisco, CA\"."}},"required":["location"]}}}],"tool_choice":"auto"}

expected results

  1. Ollama support for function calling?
  2. Streaming mode function calling
  3. Non-streaming mode should enter an infinite loop

environment

I will try to track down the infinite loop issue and create a PR when I have some time

sigoden commented 3 months ago

The infinite loop is the bug of LLM. AIChat is also responsible for identifying dead loops and aborting.

Let me explain.

Let's first illustrate how a non-bug LLM works without parallel supporting.

# First Request: Question only
[{"role":"user","content":"what is the weather in Pairs and Berlin?"}]

# First Reponse: Returns the first function calling
{
  "content": null,
  "role": "assistant",
  "tool_calls": [
    {
      "function": {
        "arguments": "{\n  \"location\": \"Paris\"\n}",
        "name": "get_current_weather"
      },
      "id": "call_8e7VqAZILfH0CX4uPVW3lebx",
      "type": "function"
    }
  ]
}

# Second Request: Question + The result of first function calling
[
  {
    "role": "user",
    "content": "what is the weather in Pairs and Berlin?"
  },
  {
    "role": "assistant",
    "content": "",
    "tool_calls": [
      {
        "id": "call_8e7VqAZILfH0CX4uPVW3lebx",
        "type": "function",
        "function": {
          "name": "get_current_weather",
          "arguments": "{\n  \"location\": \"Paris\"\n}"
        }
      }
    ]
  },
  {
    "role": "tool",
    "content": "{\"output\":\"Paris: ☀️   🌡️+12°C 🌬️→1.1m/s\\n\"}",
    "tool_call_id": "call_8e7VqAZILfH0CX4uPVW3lebx"
  }
]

# Second Reponse: Return second function calling
{
  "role": "assistant",
  "content": null,
  "tool_calls": [
    {
      "id": "call_clrUp5JmYoUr5WFGmf0t12Q8",
      "type": "function",
      "function": {
        "name": "get_current_weather",
        "arguments": "{\"location\":\"Berlin\"}"
      }
    }
  ]
}

# Third Reqeust: Question + The results of two function callings
[
  {
    "role": "user",
    "content": "what is the weather in Pairs and Berlin?"
  },
  {
    "role": "assistant",
    "content": "",
    "tool_calls": [
      {
        "id": "call_8e7VqAZILfH0CX4uPVW3lebx",
        "type": "function",
        "function": {
          "name": "get_current_weather",
          "arguments": "{\n  \"location\": \"Paris\"\n}"
        }
      },
      {
        "id": "call_clrUp5JmYoUr5WFGmf0t12Q8",
        "type": "function",
        "function": {
          "name": "get_current_weather",
          "arguments": "{\"location\":\"Berlin\"}"
        }
      }
    ]
  },
  {
    "role": "tool",
    "content": "{\"output\":\"Paris: ☀️   🌡️+12°C 🌬️→1.1m/s\\n\"}",
    "tool_call_id": "call_8e7VqAZILfH0CX4uPVW3lebx"
  },
  {
    "role": "tool",
    "content": "{\"output\":\"Berlin: ☁️   🌡️+11°C 🌬️→7.2m/s\\n\"}",
    "tool_call_id": "call_clrUp5JmYoUr5WFGmf0t12Q8"
  }
]

# Third Response: Final Result
{
  "role": "assistant",
  "content": "The current weather in Paris is sunny with a temperature of +12°C and a wind speed of 1.1 m/s.\n\nIn Berlin, the current weather is cloudy with a temperature of +11°C and a wind speed of 7.2 m/s."
}

AIChat checks whether the response contains a function call. If so, it executes the function and sends the execution result together with the previous function call results to LLM for further processing until there is no function call.

The bug LLMs cannot handle function calling correctly, resulting in responses always containing function calls, eventually leading to an infinite loop.

sigoden commented 3 months ago

@blob42

  1. Ollama does not support for function calling now.
  2. Does localai function call in streaming mode follows the openai specification?
  3. 585 will fix the bug.

blob42 commented 3 months ago

Thank for the quick fix and the detailed explanation. I suspected the LLM but even with Llama 70b I had the same issue.

Regarding LocalAI yes it should support the OpenAI specification. I tried with various OpenAI libraries and it seemed to work.

Closing this issue since the loop bug was fixed.

Thanks for the fantastic work with aichat :)