Custom multi-input tools

aguynamedben commented 3 months ago

I'm interested in building custom tools (tools.Tool) that support multiple parameters, i.e. "multi-input tools" in LangChain.

The Langchain docs for Custom Tools provide two examples of custom tools—one tool that accepts a single parameter, and another tool that accepts multiple parameters. The docs for Agent Types specifies which LangChain agents support multi-input tools in the "Supports Multi Input Tools" column.

It seems for langchaingo to support multi-input tools, enhancements would need to be made to the tools.Tool interface, as it's Call() method currently only supports a single string as tool input. Is that accurate?

type Tool interface {
    Name() string
    Description() string
        // This seems to indicate there is no multi-input tool support for tools.Tool
    Call(ctx context.Context, input string) (string, error)
}

Are there any examples of langchaingo's Agents using multi-input tools? I looked through the example and didn't see any. I'm aware I can use the llms package and Model to call specific LLMs and handle the tool call responses myself, but I was eager to build a bunch of tools that adhere to the tools.Tool interface and use them with agents.

Thank you for the work you do in making langchaingo publicly available, it has taught me a lot and been a joy to work with.

aguynamedben commented 3 months ago

Playing with these ideas...

schema.AgentAction would support ToolInputSingle and ToolInputMulti

// AgentAction is the agent's action to take.
type AgentAction struct {
    Tool            string
    ToolInputSingle string
    ToolInputMulti  map[string]any
    Log             string
    ToolID          string
}

tools.Tool would add IsMultiInput() and CallMultiInput(), and rename Call() to CallSingleInput()

// Tool is a tool for the llm agent to interact with different applications.
type Tool interface {
    Name() string
    Description() string
    IsMultiInput() bool
    CallSingleInput(ctx context.Context, input string) (string, error)
    CallMultiInput(ctx context.Context, input map[string]any) (string, error)
}

Tools would provide these new methods, for example here's the updated Calculator tool:

package tools

import (
    "context"
    "fmt"

    "github.com/tmc/langchaingo/callbacks"
    "go.starlark.net/lib/math"
    "go.starlark.net/starlark"
)

// Calculator is a tool that can do math.
type Calculator struct {
    CallbacksHandler callbacks.Handler
}

var _ Tool = Calculator{}

// Description returns a string describing the calculator tool.
func (c Calculator) Description() string {
    return `Useful for getting the result of a math expression. 
    The input to this tool should be a valid mathematical expression that could be executed by a starlark evaluator.`
}

// Name returns the name of the tool.
func (c Calculator) Name() string {
    return "calculator"
}

func (c Calculator) IsMultiInput() bool {
    return false
}

// Call evaluates the input using a starlak evaluator and returns the result as a
// string. If the evaluator errors the error is given in the result to give the
// agent the ability to retry.
func (c Calculator) CallSingleInput(ctx context.Context, input string) (string, error) {
    if c.CallbacksHandler != nil {
        c.CallbacksHandler.HandleToolStart(ctx, input)
    }

    v, err := starlark.Eval(&starlark.Thread{Name: "main"}, "input", input, math.Module.Members)
    if err != nil {
        return fmt.Sprintf("error from evaluator: %s", err.Error()), nil //nolint:nilerr
    }
    result := v.String()

    if c.CallbacksHandler != nil {
        c.CallbacksHandler.HandleToolEnd(ctx, result)
    }

    return result, nil
}

func (c Calculator) CallMultiInput(ctx context.Context, input map[string]any) (string, error) {
    return "", fmt.Errorf("CallMultiInput not supported in tool %s", c.Name())
}

Executor.doAction would be updated to first check what the returns for tool.IsMultiInput(). If false, tool.CallSingleInput() would be called, if true tool.CallMultiInput() would be called.

Go doesn't support union types like TypeScript does, so I don't think there's any way to make tool.Call support multiple method signatures (i.e. one for string, one for map[string]any)

The only downsides I see to this approach are:

Users of langchaingo depending on the tools.Tool interface for in-house code will realize they need to implement IsMultiInput() and CallMultiInput() when they upgrade langchaingo. However they may appreciate being able to create multi-input tools now.
The agents (OneShotZeroAgent, OpenAIFunctionsAgent) may need to be updated some. I'm not sure how much updating they will actually need, however, if Executor.doAction and AgentAction are the main places things change. This may enable the OpenAIFunctionsAgent to support multi-input tool calls, anyway, which is rad!

The Go error messages for the Tool interface changing actually make it pretty clear to any users of tools.Tool in the wild what they need to do to maintain compatibility, i.e. they need to add tool.IsMultiInput() and tool.CallMultiInput().

I'm willing to take a stab at implementing if this API seems fine and it can likely be merged!

The main alternative I can think of is to update the ToolInput type to always be map[string]any instead of string and make the single input tool call the "exception" and detected at runtime. This might be better, as the functions would remain the same but the type passed to Tool.Call() would change. We would pair that with a clear update in the changelog that if you want to build a tool that supports a single input, use the __arg1 hack that OpenAI seemed to use to go from single -> multi tools. We could provide an example for this.

This is a less explicit upgrade path, however. I think the Python LangChain library is closer to this I believe. They use type unions for tool input parameter (string | dict[string]any).

aguynamedben commented 3 months ago

I started a branch with my approach here: https://github.com/tmc/langchaingo/compare/main...aguynamedben:langchaingo:aguynamedben/multi-input-tools?expand=1

It's working with some in-house tools I made, now I'm going to get the executor working with tools that implement Tool.CallMulti()

aguynamedben commented 3 months ago

Hmm, after further investigation it seems deeper changes would be needed in the chains package. Run() and Predict() seem to ignore any tool responses and instead rely on regex parsing of a string. Both of those methods return (string, error). Let me know if you have any ideas or plans around multi-input tools.

rogerscuall commented 3 months ago

Maybe this helps.

https://github.com/tmc/langchaingo/blob/main/examples/anthropic-tool-call-example/anthropic-tool-call-example.go

This uses anthropic tool use and you can pass multiple parameters.

tmc / langchaingo

Custom multi-input tools #987