Ollama tool calls / structured output via LiteLLM are unreliable

jackmpcollins commented 6 months ago

Follow-on from issue https://github.com/jackmpcollins/magentic/issues/194

Ollama models (via LiteLLM) are returning incorrect function name in tool call output, which leads to magentic failing to parse this.

from magentic import prompt
from magentic.chat_model.litellm_chat_model import LitellmChatModel
from pydantic import BaseModel, Field

class Superhero(BaseModel):
    name: str
    age: int = Field(description="The age of the hero, could be very old.")
    power: str = Field(examples=["Runs really fast"])
    enemies: list[str]

@prompt(
    "Create a Superhero named {name}. Use the return_superhero function. Make sure to use the correct function name.",
    model=LitellmChatModel("ollama_chat/llama3", api_base="http://localhost:11434")
)
def create_superhero(name: str) -> Superhero: ...

create_superhero("Garden Man")

ValueError: Unknown tool call: {"id":"call_4ca84210-3b30-4cd6-a109-05044d703923","function":{"arguments":"{\"Garden Man\": {\"Name\": \"Garden Man\", \"Age\": 35, \"Power\": \"Can control plants and make them grow at an incredible rate\", \"Enemies\": [\"Pest Control\", \"Weed Killer\"]}}","name":"return_ super hero"},"type":"function","index":0}

I've tried a few variations of the prompt using llama3 to get it to use the correct function name but it basically never gets this right.

In this simple case (one return type, no functions) we could patch over this by ignoring the name / assuming the output is for the return_superhero function, but that would not work for the more general case of multiple return types or functions.

The ultimate solution will require better support for tool calls from ollama and litellm. llama.cpp supports tool calls in their python client https://github.com/abetlen/llama-cpp-python#function-calling but this is not currently exposed in ollama's OpenAI-compatible API https://github.com/ollama/ollama/blob/main/docs/openai.md . I have opened a new github issue with Ollama for this https://github.com/ollama/ollama/issues/4386 . After that, LiteLLM will also require an update to make use of this.

jackmpcollins commented 4 months ago

Tools are now supported by Ollama and they have an openai-compatible API so it should be possible to use Ollama via OpenaiChatModel by setting the base_url.

https://ollama.com/blog/tool-support

EDIT: Need to wait for streamed tool calls support for this to work with magentic. Added tests in PR https://github.com/jackmpcollins/magentic/pull/281 which will pass for this

jackmpcollins commented 3 months ago

Relevant Ollama github issues

igor17400 commented 1 month ago

@jackmpcollins, thank you very much for the update on this.

I’ve been trying to get it working with an open-source/local model, and I’ve been following your discussions across different issues. However, I’m encountering the following error:

Testing LLaMA (Litellm) Model...
--- OLLaMa with str ---
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Response from LLaMA:
```json
{"subquestions": [
    {
        "id": 1,
        "question": "What is the most recent stock price quote for TSLA?",
        "depends_on": []
    }
]}

--- OLLaMa with SubQuestion --- 
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions “HTTP/1.1 200 OK”
ERROR:main:Error with LLaMA model: A string was returned by the LLM but was not an allowed output type. Consider updating the prompt to encourage the LLM to “use the tool”. Model output: ‘{“name”: “return_list_of_subquestion”, “parameters”: {“value”: “[\“What is the ticker symbol […]’

As you can see, I created two methods: one called generate_subquestions_from_query_with_str to check if the model is responding appropriately - which worked successfully. And another named generate_subquestions_from_query to test if the model can convert the response to list[SubQuestion], which failed.

Note: I have tested the following models, and all exhibit the same issue: firefunction-v2:latest, mistral:latest, llama3.1:latest, llama3-groq-tool-use:latest, and llama3.1:70b.

Do you have any ideas on how to resolve this?

Here’s the relevant part of my code:

from magentic.chat_model.retry_chat_model import RetryChatModel
from pydantic import BaseModel, Field, ValidationError
from typing import List
from magentic import (
    OpenaiChatModel,
    UserMessage,
    chatprompt,
    SystemMessage,
    prompt_chain,
)
import logging
import json

# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class SubQuestion(BaseModel):
    id: int = Field(description="The unique ID of the subquestion.")
    question: str = Field(description="The subquestion itself.")
    depends_on: List[int] = Field(
        description="The list of subquestion IDs whose answer is required to answer this subquestion.",
        default_factory=list,
    )

@chatprompt(
    SystemMessage(GENERATE_SUBQUESTION_SYSTEM_PROMPT_TEMPLATE),
    UserMessage("# User query\n{user_query}"),
    model=OpenaiChatModel(
        model="llama3.1:70b",
        api_key="ollama",
        base_url="http://localhost:11434/v1/",
    ),
)
def generate_subquestions_from_query(user_query: str) -> list[SubQuestion]: ...

@chatprompt(
    SystemMessage(GENERATE_SUBQUESTION_SYSTEM_PROMPT_TEMPLATE),
    UserMessage("# User query\n{user_query}"),
    model=OpenaiChatModel(
        model="llama3.1:70b",
        api_key="ollama",
        base_url="http://localhost:11434/v1/",
    ),
)
def generate_subquestions_from_query_with_str(user_query: str) -> str: ...

def test_llama_model():
    print("Testing LLaMA (Litellm) Model...")
    user_query = "What is the current stock price of TSLA?"
    try:
        print("--- OLLaMa with str ---")
        response = generate_subquestions_from_query_with_str(user_query)
        print(f"Response from LLaMA:")
        print(response)
        print("------------")

        print("--- OLLaMa with SubQuestion ---")
        response = generate_subquestions_from_query(user_query)
        print(f"Response from LLaMA:")
        print(response)
        print("------------")
    except Exception as e:
        logger.error(f"Error with LLaMA model: {e}")

# Run tests
test_llama_model()

Click to expand the GENERATE_SUBQUESTION_SYSTEM_PROMPT_TEMPLATE

```python GENERATE_SUBQUESTION_SYSTEM_PROMPT_TEMPLATE = """\ Don't generate any comments, or "Notes". Return only the JSON with Markdown. You are a world-class state-of-the-art agent called OpenBB Agent. Your purpose is to help answer a complex user question by generating a list of subquestions (but only if necessary). You must also specify the dependencies between subquestions, since sometimes one subquestion will require the outcome of another in order to fully answer. ## Guidelines * Don't try to be too clever * Assume Subquestions are answerable by a downstream agent using tools to lookup the information. * You must generate at least 1 subquestion. * Generate only the subquestions required to answer the user's question * Generate as few subquestions as possible required to answer the user's question * A subquestion may not depend on a subquestion that proceeds it (i.e. comes after it.) * Assume tools can be used to look-up the answer to the subquestions (e.g., for market cap, just create a subquestion asking for the market cap rather than for the components to calculate it.) ### Example output ```json {{"subquestions": [ {{ "id": 1, "question": "What are the latest financial statements of AMZN?", "depends_on": [] }}, {{ "id": 2, "question": "What is the most recent revenue and profit margin of AMZN?", "depends_on": [] }}, {{ "id": 3, "question": "What is the current price to earnings (P/E) ratio of AMZN?", "depends_on": [] }}, {{ "id": 4, "question": "Who are the peers of AMZN?", "depends_on": [] }}, {{ "id": 5, "question": "Which of AMZN's peers have the largest market cap?", "depends_on": [4] }} ]}} “””

jackmpcollins commented 1 month ago

Hi @igor17400 , the issue is that ollama does not currently parse the tool calls from the streamed response. So the model output in that error log

‘{“name”: “return_list_of_subquestion”, “parame...

is in the text content part of the request, but should instead be in the tool calls part. The relevant ollama issue from what I can see is https://github.com/ollama/ollama/issues/5796 . And there is a PR that looks promising https://github.com/ollama/ollama/pull/6452 .

Magentic uses streamed responses internally in OpenaiChatModel. I have created issue https://github.com/jackmpcollins/magentic/issues/353 to add the option to use non-streamed responses which would be a workaround in this case.

Another option would be to try use ollama via litellm. I just tested this using the example in the description of this issue and ran into an error - litellm issue https://github.com/BerriAI/litellm/issues/6135.

So there is no simple workaround at the moment, but if any of the above issues get resolved it should allow you to use local models with structured output / tools.

jackmpcollins / magentic

Ollama tool calls / structured output via LiteLLM are unreliable #207