Tool usage research - Githubissues

simonw / llm

Access large language models from the command-line

https://llm.datasette.io

Apache License 2.0

4.81k stars 266 forks source link

Tool usage research #607

Open simonw opened 2 weeks ago

simonw commented 2 weeks ago

I'm starting this research thread to drop in examples of tool usage across different LLMs, to help inform a llm feature for that.

simonw commented 2 weeks ago

For each one I'm going to try a tool for searching my own blog (I may add a QuickJS code execution tool in the future.)

simonw commented 2 weeks ago

import httpx

def blog_search(query):
    url = "https://datasette.simonwillison.net/simonwillisonblog.json"
    args = {
        "sql": """
        select
            blog_blogmark.id,
            blog_blogmark.link_url,
            blog_blogmark.link_title,
            blog_blogmark.commentary,
            blog_blogmark.created,
            blog_blogmark_fts.rank
        from
            blog_blogmark join blog_blogmark_fts
            on blog_blogmark.rowid = blog_blogmark_fts.rowid
        where
            blog_blogmark_fts match escape_fts(:search)
        order by
            rank
        limit
            5
    """,
        "_shape": "array",
        "search": query,
    }
    return httpx.get(url, params=args).json()

simonw commented 2 weeks ago

First, OpenAI: https://platform.openai.com/docs/guides/function-calling

import json
import llm
import openai

client = openai.OpenAI(api_key=llm.get_key('', 'openai')

tools = [
    {
        "type": "function",
        "function": {
            "name": "search_blog",
            "description": "Search for posts on the blog.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "Search query",
                    }
                },
                "required": ["query"],
                "additionalProperties": False,
            },
        },
    }
]

messages = []
messages.append(
    {"role": "user", "content": "Hi, what do you know about anthropic?"}
)
response = client.chat.completions.create(
    model="gpt-4o", messages=messages, tools=tools
)

Run the search, then:

function_call_result_message = {
    "role": "tool",
    "content": json.dumps(results),
    "tool_call_id": response.choices[0].message.tool_calls[0].id,
}
messages.append(response.choices[0].message.dict())
messages.append(function_call_result_message)
response2 = client.chat.completions.create(
    model="gpt-4o", messages=messages, tools=tools
)
print(response2.choices[0].message.content)

Lining up the tool_call_id is important or you get an error.

simonw commented 2 weeks ago

Anthropic: https://docs.anthropic.com/en/docs/build-with-claude/tool-use and https://github.com/anthropics/courses/blob/master/tool_use/04_complete_workflow.ipynb

Anthropic tools look similar to OpenAI ones, but instead of parameters use input_schema and they don't nest inside an outer object. So https://github.com/simonw/llm/issues/607#issuecomment-2456196254 looks like this instead:

anthropic_tool = {
    "name": "search_blog",
    "description": "Search for posts on the blog.",
    "input_schema": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "Search query",
            }
        },
        "required": ["query"],
        "additionalProperties": False,
    },
}

import anthropic

anthropic_client = anthropic.Anthropic(api_key=llm.get_key("", "claude"))

messages = [{"role": "user", "content": "Tell me about pelicans"}]

response = anthropic_client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    tools=[anthropic_tool],
    messages=messages,
)

Message(
    id="msg_01FuLBrwiQih4aY4WSxAXGnj",
    content=[
        TextBlock(text="I'll search the blog for posts about pelicans.", type="text"),
        ToolUseBlock(
            id="toolu_01YSaFvtW3mjbrg8hSGH7FkZ",
            input={"query": "pelicans"},
            name="search_blog",
            type="tool_use",
        ),
    ],
    model="claude-3-5-sonnet-20241022",
    role="assistant",
    stop_reason="tool_use",
    stop_sequence=None,
    type="message",
    usage=Usage(input_tokens=394, output_tokens=69),
)

And now:

messages.append({
    "role": "assistant",
    "content": [r.dict() for r in response.content]
})

results = blog_search(response.content[-1].input["query"])

tool_response = {
    "role": "user",
    "content": [
        {
        "type": "tool_result",
        "tool_use_id": response.content[-1].id,
        "content": json.dumps(results)
        }
    ]
}

messages.append(tool_response)

response2 = anthropic_client.messages.create(
    model="claude-3-sonnet-20240229",
    messages=messages,
    max_tokens=1024,
    tools=[anthropic_tool]
)
print(response2.content[0].text)

The blog posts provide some interesting information about pelicans:

Pelicans are a group of aquatic birds known for their distinctive pouch-like bills that they use to scoop up fish and other prey from the water.

Common species include the Brown Pelican found in the Americas that plunges into the water from heights to catch fish, and the American White Pelican with white plumage and a large pink bill found in the Americas and Eurasia.

The Brown Pelican is one of the largest pelican species, with an average height around 26-30 inches and bills up to 11 inches long.

One blog featured creative AI-generated images and descriptions of pelicans riding bicycles by different language models as a novel benchmark test.

Another blog highlighted using an AI to have pelican personas discuss topics like data journalism and video analysis in a pelican newscaster style.

There is work on compact but capable language models like SmolLM2 that can discuss topics like pelicans while running efficiently on devices.

So in summary, the blogs cover pelican biology, using pelican prompts to creatively test language models, and developing efficient on-device models that can still discuss pelicans knowledgeably. Let me know if you need any other details!

simonw commented 2 weeks ago

Gemini calls it function calling: https://ai.google.dev/gemini-api/docs/function-calling and https://ai.google.dev/gemini-api/docs/function-calling/tutorial (which has curl examples).

Here's a useful curlexample from that page: https://ai.google.dev/gemini-api/docs/function-calling

curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent?key=$GEMINI_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '
{
    "contents": [
        {
            "role": "user",
            "parts": [
                {
                    "text": "Which theaters in Mountain View show Barbie movie?"
                }
            ]
        },
        {
            "role": "model",
            "parts": [
                {
                    "functionCall": {
                        "name": "find_theaters",
                        "args": {
                            "location": "Mountain View, CA",
                            "movie": "Barbie"
                        }
                    }
                }
            ]
        },
        {
            "role": "user",
            "parts": [
                {
                    "functionResponse": {
                        "name": "find_theaters",
                        "response": {
                            "name": "find_theaters",
                            "content": {
                                "movie": "Barbie",
                                "theaters": [
                                    {
                                        "name": "AMC Mountain View 16",
                                        "address": "2000 W El Camino Real, Mountain View, CA 94040"
                                    },
                                    {
                                        "name": "Regal Edwards 14",
                                        "address": "245 Castro St, Mountain View, CA 94040"
                                    }
                                ]
                            }
                        }
                    }
                }
            ]
        }
    ],
    "tools": [
        {
            "functionDeclarations": [
                {
                    "name": "find_movies",
                    "description": "find movie titles currently playing in theaters based on any description, genre, title words, etc.",
                    "parameters": {
                        "type": "OBJECT",
                        "properties": {
                            "location": {
                                "type": "STRING",
                                "description": "The city and state, e.g. San Francisco, CA or a zip code e.g. 95616"
                            },
                            "description": {
                                "type": "STRING",
                                "description": "Any kind of description including category or genre, title words, attributes, etc."
                            }
                        },
                        "required": [
                            "description"
                        ]
                    }
                },
                {
                    "name": "find_theaters",
                    "description": "find theaters based on location and optionally movie title which is currently playing in theaters",
                    "parameters": {
                        "type": "OBJECT",
                        "properties": {
                            "location": {
                                "type": "STRING",
                                "description": "The city and state, e.g. San Francisco, CA or a zip code e.g. 95616"
                            },
                            "movie": {
                                "type": "STRING",
                                "description": "Any movie title"
                            }
                        },
                        "required": [
                            "location"
                        ]
                    }
                },
                {
                    "name": "get_showtimes",
                    "description": "Find the start times for movies playing in a specific theater",
                    "parameters": {
                        "type": "OBJECT",
                        "properties": {
                            "location": {
                                "type": "STRING",
                                "description": "The city and state, e.g. San Francisco, CA or a zip code e.g. 95616"
                            },
                            "movie": {
                                "type": "STRING",
                                "description": "Any movie title"
                            },
                            "theater": {
                                "type": "STRING",
                                "description": "Name of the theater"
                            },
                            "date": {
                                "type": "STRING",
                                "description": "Date for requested showtime"
                            }
                        },
                        "required": [
                            "location",
                            "movie",
                            "theater",
                            "date"
                        ]
                    }
                }
            ]
        }
    ]
}
'

I got back:

{
  "candidates": [
    {
      "content": {
        "parts": [
          {
            "text": "OK. I found two theaters in Mountain View showing Barbie: AMC Mountain View 16 and Regal Edwards 14."
          }
        ],
        "role": "model"
      },
      "finishReason": "STOP",
      "index": 0,
      "safetyRatings": [
        {
          "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_HATE_SPEECH",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_HARASSMENT",
          "probability": "NEGLIGIBLE"
        },
        {
          "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
          "probability": "NEGLIGIBLE"
        }
      ]
    }
  ],
  "usageMetadata": {
    "promptTokenCount": 448,
    "candidatesTokenCount": 25,
    "totalTokenCount": 473
  },
  "modelVersion": "gemini-pro"
}

simonw commented 2 weeks ago

Decided to see if I could figure it out for llama-cpp-python and llm-gguf. Managed to get this working:

llm -m Hermes-3-Llama-3.1-8B 'tell me what the blog says about pelicans' --no-stream -s 'you derive keywords from questions and search for them'

And these code changes:

         if self._model is None:
-            self._model = Llama(
-                model_path=self.model_path, verbose=False, n_ctx=0  # "0 = from model"
-            )

+            self._model = Llama(
+                model_path=self.model_path, verbose=False, n_ctx=self.n_ctx,
+                chat_format="chatml-function-calling"
+            )

@@ -171,7 +221,27 @@ class GgufChatModel(llm.Model):

         if not stream:
             model = self.get_model()
-            completion = model.create_chat_completion(messages=messages)
+            completion = model.create_chat_completion(messages=messages, tools=[
+                {
+                    "type": "function",
+                    "function": {
+                        "name": "search_blog",
+                        "description": "Search for posts on the blog.",
+                        "parameters": {
+                            "type": "object",
+                            "properties": {
+                                "query": {
+                                    "type": "string",
+                                    "description": "Search query, keywords only",
+                                }
+                            },
+                            "required": ["query"],
+                            "additionalProperties": False,
+                        },
+                    }
+                }
+            ], tool_choice="auto")
+            breakpoint()

Which gave me this for completion at the breakpoint:

{
  "id": "chatcmpl-60256c0d-744d-449a-a433-3faa6224f770",
  "object": "chat.completion",
  "created": 1730782682,
  "model": "/Users/simon/Library/Application Support/io.datasette.llm/gguf/models/Hermes-3-Llama-3.1-8B.Q4_K_M.gguf",
  "choices": [
    {
      "finish_reason": "tool_calls",
      "index": 0,
      "logprobs": null,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call__0_search_blog_cmpl-c4456f81-e402-4f82-ab4d-6063c0b894ff",
            "type": "function",
            "function": {
              "name": "search_blog",
              "arguments": "{\"query\": \"pelicans blog\"}"
            }
          }
        ],
        "function_call": {
          "name": "search_blog:",
          "arguments": "{\"query\": \"pelicans blog\"}"
        }
      }
    }
  ],
  "usage": {
    "completion_tokens": 8,
    "prompt_tokens": 202,
    "total_tokens": 210
  }
}

So it looks like this is feasible, using that chat_format="chatml-function-calling" option. The tool_choice="auto" was necessary too.

simonw commented 2 weeks ago

OK, I now have function calling examples for OpenAI, Anthropic, Gemini and Llama.cpp. That's probably enough.

The most complex examples are the ones that need to persist and then re-send a tool ID (OpenAI and Anthropic).

AtakanTekparmak commented 2 weeks ago

Is this going to be only about the implementation of function calling from major providers or more like a discussion of further methods as well? I have been using function calling in pure python for a while and have experienced a significant performance hike especially in sub 20B models. First started prototyping with this prompt and then settled on the implementation in agento, with this system prompt and this engine implementation, inside the execute_python_code method. An example function call is like below:

             ╭────────────────────────────────────────────────────────╮                                                                                                     
 user        │ Can you get 4 apples, eat 1 of them and sell the rest? │                                                                                                     
             ╰────────────────────────────────────────────────────────╯                                                                                                     
             ╭─────────────────────────────────────────────────╮                                                                                                            
 Apple       │ ```python                                       │                                                                                                            
 Agent       │ apples = get_apples(4)                          │                                                                                                            
             │ apples_after_eating = eat_apples(apples, 1)     │                                                                                                            
             │ money_earned = sell_apples(apples_after_eating) │                                                                                                            
             │ ```                                             │                                                                                                            
             ╰─────────────────────────────────────────────────╯

where get_apples, eat_apples and sell_apples are defined like the following:

def get_apples(quantity: int) -> List[str]:
    """
    Get a certain quantity of apples.

    Args:
        quantity (int): The quantity of apples to get.

    Returns:
        List[str]: A list of apples.
    """
    return ["Apple" for _ in range(quantity)]

def eat_apples(apples: List[str], quantity: int) -> List[str]:
    """
    Eat a certain quantity of apples.

    Args:
        apples (List[str]): A list of apples.
        quantity (int): The quantity of apples to eat.

    Returns:
        List[str]: The remaining apples.
    """
    return apples[quantity:] if quantity < len(apples) else []

def sell_apples(apples: List[str]) -> str:
    """
    Sell all the apples provided.

    Args:
        apples (List[str]): A list of apples.

    Returns:
        str: The money earned from selling the apples.
    """
    return f"${len(apples) * 1}"

simonw commented 2 weeks ago

My goal is to expand the model plugin mechanism so that new models can be registered that support tool usage. Ideally this would enable people to write their own plugins that implement tool usage via prompting if they want to.

kennethreitz commented 2 weeks ago

Following

chrisVillanueva commented 2 weeks ago

this is promising. subscribed.

ErikBjare commented 2 weeks ago

In gptme I'm using a tool calling format based on markdown codeblocks in the normal text output.

It predates tool calling APIs, so it works by detecting tool calls as output is streamed and interrupting the stream when a valid tool call was finished.

Example, to save a file hello.txt:

```save hello.txt
Hello world
```

To run ipython, where functions can be registered:

```ipython
search("search query", engine="duckduckgo")
```

You can find the full system prompt and detailed examples here: https://gptme.org/docs/prompts.html

I've also worked on a XML-format of this (https://github.com/ErikBjare/gptme/pull/121), as well as support for the actual tool calling APIs now available via OpenAI, Anthropic, OpenRouter, Ollama (https://github.com/ErikBjare/gptme/pull/219).

simonw commented 2 weeks ago

In gptme I'm using a tool calling format based on markdown codeblocks in the normal text output.

I like that a lot. Feels like the kind of format ssit any capable LLM could be convinced to output, and very easy to parse. I'll think about how that might be supported.

bsima commented 3 days ago

Note that ollama supports the same tool calling API as OpenAI: https://ollama.com/blog/tool-support

bsima commented 3 days ago

Using the markdown-style format like gptme uses is a great idea, but I think it's also worth supporting the native function calling formats for each model since they are trained on those formats specifically, so we can expect it to work.

My suggestion would be to have a class that abstracts over the model-native formats, each model can implement their own to_json method, and llm provides a 'to_markdown' method as a backup.

Here's some scratch code of what I would suggest:

class ToolInterface(ABC):
    def call(): pass
    def doc(): pass

class MyTool(ToolInterface):
    def call():
        # implement the tool logic
    def doc():
        # return a dict that describes the tool

class ModelWithJsonTools:
    def __init__(self, tools, ...):
        self.tools = tools

    def get_tools(self):
        # returns the doc for all the tools in the json format specific to this model
        # format_tool_spec could be a shared function, or specific to the model/plugin
        return json.dumps([format_tool_spec(tool) for tool in self.tools])

a = llm.get_model(ModelWithJsonTools, tools=[MyTool])

class ModelWithMarkdownTools:
    # same as ModelWithJsonTools except:
    def get_tools(self):
        # llm should provide the markdown spec for a consistent format
        return llm.tools.format_markdown_spec(self.tools)

b = llm.get_model(ModelWithMarkdownTools, tools=[MyTool])

The markdown spec that gptme uses could be borrowed, it looks good.

Maybe this functionality should be implemented in a plugin like llm-tools, is that possible or advisable?