Open jackmpcollins opened 6 months ago
Tools are now supported by Ollama and they have an openai-compatible API so it should be possible to use Ollama via OpenaiChatModel
by setting the base_url
.
https://ollama.com/blog/tool-support
EDIT: Need to wait for streamed tool calls support for this to work with magentic. Added tests in PR https://github.com/jackmpcollins/magentic/pull/281 which will pass for this
@jackmpcollins, thank you very much for the update on this.
I’ve been trying to get it working with an open-source/local model, and I’ve been following your discussions across different issues. However, I’m encountering the following error:
Testing LLaMA (Litellm) Model...
--- OLLaMa with str ---
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions "HTTP/1.1 200 OK"
Response from LLaMA:
```json
{"subquestions": [
{
"id": 1,
"question": "What is the most recent stock price quote for TSLA?",
"depends_on": []
}
]}
--- OLLaMa with SubQuestion ---
INFO:httpx:HTTP Request: POST http://localhost:11434/v1/chat/completions “HTTP/1.1 200 OK”
ERROR:main:Error with LLaMA model: A string was returned by the LLM but was not an allowed output type. Consider updating the prompt to encourage the LLM to “use the tool”. Model output: ‘{“name”: “return_list_of_subquestion”, “parameters”: {“value”: “[\“What is the ticker symbol […]’
As you can see, I created two methods: one called generate_subquestions_from_query_with_str
to check if the model is responding appropriately - which worked successfully. And another named generate_subquestions_from_query
to test if the model can convert the response to list[SubQuestion]
, which failed.
Note: I have tested the following models, and all exhibit the same issue: firefunction-v2:latest
, mistral:latest
, llama3.1:latest
, llama3-groq-tool-use:latest
, and llama3.1:70b
.
Do you have any ideas on how to resolve this?
Here’s the relevant part of my code:
from magentic.chat_model.retry_chat_model import RetryChatModel
from pydantic import BaseModel, Field, ValidationError
from typing import List
from magentic import (
OpenaiChatModel,
UserMessage,
chatprompt,
SystemMessage,
prompt_chain,
)
import logging
import json
# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class SubQuestion(BaseModel):
id: int = Field(description="The unique ID of the subquestion.")
question: str = Field(description="The subquestion itself.")
depends_on: List[int] = Field(
description="The list of subquestion IDs whose answer is required to answer this subquestion.",
default_factory=list,
)
@chatprompt(
SystemMessage(GENERATE_SUBQUESTION_SYSTEM_PROMPT_TEMPLATE),
UserMessage("# User query\n{user_query}"),
model=OpenaiChatModel(
model="llama3.1:70b",
api_key="ollama",
base_url="http://localhost:11434/v1/",
),
)
def generate_subquestions_from_query(user_query: str) -> list[SubQuestion]: ...
@chatprompt(
SystemMessage(GENERATE_SUBQUESTION_SYSTEM_PROMPT_TEMPLATE),
UserMessage("# User query\n{user_query}"),
model=OpenaiChatModel(
model="llama3.1:70b",
api_key="ollama",
base_url="http://localhost:11434/v1/",
),
)
def generate_subquestions_from_query_with_str(user_query: str) -> str: ...
def test_llama_model():
print("Testing LLaMA (Litellm) Model...")
user_query = "What is the current stock price of TSLA?"
try:
print("--- OLLaMa with str ---")
response = generate_subquestions_from_query_with_str(user_query)
print(f"Response from LLaMA:")
print(response)
print("------------")
print("--- OLLaMa with SubQuestion ---")
response = generate_subquestions_from_query(user_query)
print(f"Response from LLaMA:")
print(response)
print("------------")
except Exception as e:
logger.error(f"Error with LLaMA model: {e}")
# Run tests
test_llama_model()
Hi @igor17400 , the issue is that ollama does not currently parse the tool calls from the streamed response. So the model output in that error log
‘{“name”: “return_list_of_subquestion”, “parame...
is in the text content part of the request, but should instead be in the tool calls part. The relevant ollama issue from what I can see is https://github.com/ollama/ollama/issues/5796 . And there is a PR that looks promising https://github.com/ollama/ollama/pull/6452 .
Magentic uses streamed responses internally in OpenaiChatModel
. I have created issue https://github.com/jackmpcollins/magentic/issues/353 to add the option to use non-streamed responses which would be a workaround in this case.
Another option would be to try use ollama via litellm. I just tested this using the example in the description of this issue and ran into an error - litellm issue https://github.com/BerriAI/litellm/issues/6135.
So there is no simple workaround at the moment, but if any of the above issues get resolved it should allow you to use local models with structured output / tools.
Follow-on from issue https://github.com/jackmpcollins/magentic/issues/194
Ollama models (via LiteLLM) are returning incorrect function name in tool call output, which leads to magentic failing to parse this.
I've tried a few variations of the prompt using
llama3
to get it to use the correct function name but it basically never gets this right.In this simple case (one return type, no functions) we could patch over this by ignoring the name / assuming the output is for the return_superhero function, but that would not work for the more general case of multiple return types or functions.
The ultimate solution will require better support for tool calls from ollama and litellm. llama.cpp supports tool calls in their python client https://github.com/abetlen/llama-cpp-python#function-calling but this is not currently exposed in ollama's OpenAI-compatible API https://github.com/ollama/ollama/blob/main/docs/openai.md . I have opened a new github issue with Ollama for this https://github.com/ollama/ollama/issues/4386 . After that, LiteLLM will also require an update to make use of this.