Open simonw opened 2 weeks ago
For each one I'm going to try a tool for searching my own blog (I may add a QuickJS code execution tool in the future.)
import httpx
def blog_search(query):
url = "https://datasette.simonwillison.net/simonwillisonblog.json"
args = {
"sql": """
select
blog_blogmark.id,
blog_blogmark.link_url,
blog_blogmark.link_title,
blog_blogmark.commentary,
blog_blogmark.created,
blog_blogmark_fts.rank
from
blog_blogmark join blog_blogmark_fts
on blog_blogmark.rowid = blog_blogmark_fts.rowid
where
blog_blogmark_fts match escape_fts(:search)
order by
rank
limit
5
""",
"_shape": "array",
"search": query,
}
return httpx.get(url, params=args).json()
First, OpenAI: https://platform.openai.com/docs/guides/function-calling
import json
import llm
import openai
client = openai.OpenAI(api_key=llm.get_key('', 'openai')
tools = [
{
"type": "function",
"function": {
"name": "search_blog",
"description": "Search for posts on the blog.",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query",
}
},
"required": ["query"],
"additionalProperties": False,
},
},
}
]
messages = []
messages.append(
{"role": "user", "content": "Hi, what do you know about anthropic?"}
)
response = client.chat.completions.create(
model="gpt-4o", messages=messages, tools=tools
)
Run the search, then:
function_call_result_message = {
"role": "tool",
"content": json.dumps(results),
"tool_call_id": response.choices[0].message.tool_calls[0].id,
}
messages.append(response.choices[0].message.dict())
messages.append(function_call_result_message)
response2 = client.chat.completions.create(
model="gpt-4o", messages=messages, tools=tools
)
print(response2.choices[0].message.content)
Lining up the tool_call_id
is important or you get an error.
Anthropic: https://docs.anthropic.com/en/docs/build-with-claude/tool-use and https://github.com/anthropics/courses/blob/master/tool_use/04_complete_workflow.ipynb
Anthropic tools look similar to OpenAI ones, but instead of parameters
use input_schema
and they don't nest inside an outer object. So https://github.com/simonw/llm/issues/607#issuecomment-2456196254 looks like this instead:
anthropic_tool = {
"name": "search_blog",
"description": "Search for posts on the blog.",
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query",
}
},
"required": ["query"],
"additionalProperties": False,
},
}
import anthropic
anthropic_client = anthropic.Anthropic(api_key=llm.get_key("", "claude"))
messages = [{"role": "user", "content": "Tell me about pelicans"}]
response = anthropic_client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
tools=[anthropic_tool],
messages=messages,
)
Message(
id="msg_01FuLBrwiQih4aY4WSxAXGnj",
content=[
TextBlock(text="I'll search the blog for posts about pelicans.", type="text"),
ToolUseBlock(
id="toolu_01YSaFvtW3mjbrg8hSGH7FkZ",
input={"query": "pelicans"},
name="search_blog",
type="tool_use",
),
],
model="claude-3-5-sonnet-20241022",
role="assistant",
stop_reason="tool_use",
stop_sequence=None,
type="message",
usage=Usage(input_tokens=394, output_tokens=69),
)
And now:
messages.append({
"role": "assistant",
"content": [r.dict() for r in response.content]
})
results = blog_search(response.content[-1].input["query"])
tool_response = {
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": response.content[-1].id,
"content": json.dumps(results)
}
]
}
messages.append(tool_response)
response2 = anthropic_client.messages.create(
model="claude-3-sonnet-20240229",
messages=messages,
max_tokens=1024,
tools=[anthropic_tool]
)
print(response2.content[0].text)
The blog posts provide some interesting information about pelicans:
Pelicans are a group of aquatic birds known for their distinctive pouch-like bills that they use to scoop up fish and other prey from the water.
Common species include the Brown Pelican found in the Americas that plunges into the water from heights to catch fish, and the American White Pelican with white plumage and a large pink bill found in the Americas and Eurasia.
The Brown Pelican is one of the largest pelican species, with an average height around 26-30 inches and bills up to 11 inches long.
One blog featured creative AI-generated images and descriptions of pelicans riding bicycles by different language models as a novel benchmark test.
Another blog highlighted using an AI to have pelican personas discuss topics like data journalism and video analysis in a pelican newscaster style.
There is work on compact but capable language models like SmolLM2 that can discuss topics like pelicans while running efficiently on devices.
So in summary, the blogs cover pelican biology, using pelican prompts to creatively test language models, and developing efficient on-device models that can still discuss pelicans knowledgeably. Let me know if you need any other details!
Gemini calls it function calling: https://ai.google.dev/gemini-api/docs/function-calling and https://ai.google.dev/gemini-api/docs/function-calling/tutorial (which has curl
examples).
Here's a useful curl
example from that page: https://ai.google.dev/gemini-api/docs/function-calling
curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent?key=$GEMINI_API_KEY" \
-H 'Content-Type: application/json' \
-d '
{
"contents": [
{
"role": "user",
"parts": [
{
"text": "Which theaters in Mountain View show Barbie movie?"
}
]
},
{
"role": "model",
"parts": [
{
"functionCall": {
"name": "find_theaters",
"args": {
"location": "Mountain View, CA",
"movie": "Barbie"
}
}
}
]
},
{
"role": "user",
"parts": [
{
"functionResponse": {
"name": "find_theaters",
"response": {
"name": "find_theaters",
"content": {
"movie": "Barbie",
"theaters": [
{
"name": "AMC Mountain View 16",
"address": "2000 W El Camino Real, Mountain View, CA 94040"
},
{
"name": "Regal Edwards 14",
"address": "245 Castro St, Mountain View, CA 94040"
}
]
}
}
}
}
]
}
],
"tools": [
{
"functionDeclarations": [
{
"name": "find_movies",
"description": "find movie titles currently playing in theaters based on any description, genre, title words, etc.",
"parameters": {
"type": "OBJECT",
"properties": {
"location": {
"type": "STRING",
"description": "The city and state, e.g. San Francisco, CA or a zip code e.g. 95616"
},
"description": {
"type": "STRING",
"description": "Any kind of description including category or genre, title words, attributes, etc."
}
},
"required": [
"description"
]
}
},
{
"name": "find_theaters",
"description": "find theaters based on location and optionally movie title which is currently playing in theaters",
"parameters": {
"type": "OBJECT",
"properties": {
"location": {
"type": "STRING",
"description": "The city and state, e.g. San Francisco, CA or a zip code e.g. 95616"
},
"movie": {
"type": "STRING",
"description": "Any movie title"
}
},
"required": [
"location"
]
}
},
{
"name": "get_showtimes",
"description": "Find the start times for movies playing in a specific theater",
"parameters": {
"type": "OBJECT",
"properties": {
"location": {
"type": "STRING",
"description": "The city and state, e.g. San Francisco, CA or a zip code e.g. 95616"
},
"movie": {
"type": "STRING",
"description": "Any movie title"
},
"theater": {
"type": "STRING",
"description": "Name of the theater"
},
"date": {
"type": "STRING",
"description": "Date for requested showtime"
}
},
"required": [
"location",
"movie",
"theater",
"date"
]
}
}
]
}
]
}
'
I got back:
{
"candidates": [
{
"content": {
"parts": [
{
"text": "OK. I found two theaters in Mountain View showing Barbie: AMC Mountain View 16 and Regal Edwards 14."
}
],
"role": "model"
},
"finishReason": "STOP",
"index": 0,
"safetyRatings": [
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"probability": "NEGLIGIBLE"
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"probability": "NEGLIGIBLE"
},
{
"category": "HARM_CATEGORY_HARASSMENT",
"probability": "NEGLIGIBLE"
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"probability": "NEGLIGIBLE"
}
]
}
],
"usageMetadata": {
"promptTokenCount": 448,
"candidatesTokenCount": 25,
"totalTokenCount": 473
},
"modelVersion": "gemini-pro"
}
Decided to see if I could figure it out for llama-cpp-python
and llm-gguf
. Managed to get this working:
llm -m Hermes-3-Llama-3.1-8B 'tell me what the blog says about pelicans' --no-stream -s 'you derive keywords from questions and search for them'
And these code changes:
if self._model is None:
- self._model = Llama(
- model_path=self.model_path, verbose=False, n_ctx=0 # "0 = from model"
- )
+ self._model = Llama(
+ model_path=self.model_path, verbose=False, n_ctx=self.n_ctx,
+ chat_format="chatml-function-calling"
+ )
@@ -171,7 +221,27 @@ class GgufChatModel(llm.Model):
if not stream:
model = self.get_model()
- completion = model.create_chat_completion(messages=messages)
+ completion = model.create_chat_completion(messages=messages, tools=[
+ {
+ "type": "function",
+ "function": {
+ "name": "search_blog",
+ "description": "Search for posts on the blog.",
+ "parameters": {
+ "type": "object",
+ "properties": {
+ "query": {
+ "type": "string",
+ "description": "Search query, keywords only",
+ }
+ },
+ "required": ["query"],
+ "additionalProperties": False,
+ },
+ }
+ }
+ ], tool_choice="auto")
+ breakpoint()
Which gave me this for completion
at the breakpoint:
{
"id": "chatcmpl-60256c0d-744d-449a-a433-3faa6224f770",
"object": "chat.completion",
"created": 1730782682,
"model": "/Users/simon/Library/Application Support/io.datasette.llm/gguf/models/Hermes-3-Llama-3.1-8B.Q4_K_M.gguf",
"choices": [
{
"finish_reason": "tool_calls",
"index": 0,
"logprobs": null,
"message": {
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "call__0_search_blog_cmpl-c4456f81-e402-4f82-ab4d-6063c0b894ff",
"type": "function",
"function": {
"name": "search_blog",
"arguments": "{\"query\": \"pelicans blog\"}"
}
}
],
"function_call": {
"name": "search_blog:",
"arguments": "{\"query\": \"pelicans blog\"}"
}
}
}
],
"usage": {
"completion_tokens": 8,
"prompt_tokens": 202,
"total_tokens": 210
}
}
So it looks like this is feasible, using that chat_format="chatml-function-calling"
option. The tool_choice="auto"
was necessary too.
OK, I now have function calling examples for OpenAI, Anthropic, Gemini and Llama.cpp. That's probably enough.
The most complex examples are the ones that need to persist and then re-send a tool ID (OpenAI and Anthropic).
Is this going to be only about the implementation of function calling from major providers or more like a discussion of further methods as well? I have been using function calling in pure python for a while and have experienced a significant performance hike especially in sub 20B models. First started prototyping with this prompt and then settled on the implementation in agento, with this system prompt and this engine implementation, inside the execute_python_code
method. An example function call is like below:
╭────────────────────────────────────────────────────────╮
user │ Can you get 4 apples, eat 1 of them and sell the rest? │
╰────────────────────────────────────────────────────────╯
╭─────────────────────────────────────────────────╮
Apple │ ```python │
Agent │ apples = get_apples(4) │
│ apples_after_eating = eat_apples(apples, 1) │
│ money_earned = sell_apples(apples_after_eating) │
│ ``` │
╰─────────────────────────────────────────────────╯
where get_apples
, eat_apples
and sell_apples
are defined like the following:
def get_apples(quantity: int) -> List[str]:
"""
Get a certain quantity of apples.
Args:
quantity (int): The quantity of apples to get.
Returns:
List[str]: A list of apples.
"""
return ["Apple" for _ in range(quantity)]
def eat_apples(apples: List[str], quantity: int) -> List[str]:
"""
Eat a certain quantity of apples.
Args:
apples (List[str]): A list of apples.
quantity (int): The quantity of apples to eat.
Returns:
List[str]: The remaining apples.
"""
return apples[quantity:] if quantity < len(apples) else []
def sell_apples(apples: List[str]) -> str:
"""
Sell all the apples provided.
Args:
apples (List[str]): A list of apples.
Returns:
str: The money earned from selling the apples.
"""
return f"${len(apples) * 1}"
My goal is to expand the model plugin mechanism so that new models can be registered that support tool usage. Ideally this would enable people to write their own plugins that implement tool usage via prompting if they want to.
Following
this is promising. subscribed.
In gptme I'm using a tool calling format based on markdown codeblocks in the normal text output.
It predates tool calling APIs, so it works by detecting tool calls as output is streamed and interrupting the stream when a valid tool call was finished.
Example, to save a file hello.txt
:
```save hello.txt
Hello world
```
To run ipython, where functions can be registered:
```ipython
search("search query", engine="duckduckgo")
```
You can find the full system prompt and detailed examples here: https://gptme.org/docs/prompts.html
I've also worked on a XML-format of this (https://github.com/ErikBjare/gptme/pull/121), as well as support for the actual tool calling APIs now available via OpenAI, Anthropic, OpenRouter, Ollama (https://github.com/ErikBjare/gptme/pull/219).
In gptme I'm using a tool calling format based on markdown codeblocks in the normal text output.
I like that a lot. Feels like the kind of format ssit any capable LLM could be convinced to output, and very easy to parse. I'll think about how that might be supported.
Note that ollama supports the same tool calling API as OpenAI: https://ollama.com/blog/tool-support
Using the markdown-style format like gptme uses is a great idea, but I think it's also worth supporting the native function calling formats for each model since they are trained on those formats specifically, so we can expect it to work.
My suggestion would be to have a class that abstracts over the model-native formats, each model can implement their own to_json
method, and llm
provides a 'to_markdown' method as a backup.
Here's some scratch code of what I would suggest:
class ToolInterface(ABC):
def call(): pass
def doc(): pass
class MyTool(ToolInterface):
def call():
# implement the tool logic
def doc():
# return a dict that describes the tool
class ModelWithJsonTools:
def __init__(self, tools, ...):
self.tools = tools
def get_tools(self):
# returns the doc for all the tools in the json format specific to this model
# format_tool_spec could be a shared function, or specific to the model/plugin
return json.dumps([format_tool_spec(tool) for tool in self.tools])
a = llm.get_model(ModelWithJsonTools, tools=[MyTool])
class ModelWithMarkdownTools:
# same as ModelWithJsonTools except:
def get_tools(self):
# llm should provide the markdown spec for a consistent format
return llm.tools.format_markdown_spec(self.tools)
b = llm.get_model(ModelWithMarkdownTools, tools=[MyTool])
The markdown spec that gptme uses could be borrowed, it looks good.
Maybe this functionality should be implemented in a plugin like llm-tools
, is that possible or advisable?
I'm starting this research thread to drop in examples of tool usage across different LLMs, to help inform a
llm
feature for that.