tarasglek / chatcraft.org

Developer-oriented ChatGPT clone
https://chatcraft.org/
MIT License
154 stars 33 forks source link

Let's try to define and prototype tools #86

Closed humphd closed 1 year ago

humphd commented 1 year ago

We've spoken a lot about "tools," and the abilities they'd unlock. Now that we have the message types in place, routing, and the db backend, I think we have most of the pieces necessary to start playing around with this.

It likely makes sense to consider this in the context of Langchain, which already gives us access to Agents and Tools. Here, a "tool" means:

interface Tool {
  call(arg: string): Promise<string>;
  name: string;
  description: string;
}

So we could begin by providing a way to write/upload/store JS (or TS and transpile, or I guess whatever lang and compile to WASM) which exports a call(arg: string): Promise<string> function.

We now have /c/* for chats. What about /t/* for tools?

How do we use these tools? Do we create a Toolkit for an agent to use, based on all the /t/* tools owned by the user? Or do we have some way to insert a Tool as context in a message? Or both?

humphd commented 1 year ago

OpenAI announced changes to how models interact with functions:

Function calling

Developers can now describe functions to gpt-4-0613 and gpt-3.5-turbo-0613, and have the model intelligently choose to output a JSON object containing arguments to call those functions. This is a new way to more reliably connect GPT's capabilities with external tools and APIs.

These models have been fine-tuned to both detect when a function needs to be called (depending on the user’s input) and to respond with JSON that adheres to the function signature. Function calling allows developers to more reliably get structured data back from the model. For example, developers can:

Create chatbots that answer questions by calling external tools (e.g., like ChatGPT Plugins)

Convert queries such as “Email Anya to see if she wants to get coffee next Friday” to a function call like send_email(to: string, body: string), or “What’s the weather like in Boston?” to get_current_weather(location: string, unit: 'celsius' | 'fahrenheit').

Convert natural language into API calls or database queries

Convert “Who are my top ten customers this month?” to an internal API call such as get_customers_by_revenue(start_date: string, end_date: string, limit: int), or “How many orders did Acme, Inc. place last month?” to a SQL query using sql_query(query: string).

Extract structured data from text

Define a function called extract_people_data(people: [{name: string, birthday: string, location: string}]), to extract all people mentioned in a Wikipedia article.

These use cases are enabled by new API parameters in our /v1/chat/completions endpoint, functions and function_call, that allow developers to describe functions to the model via JSON Schema, and optionally ask it to call a specific function. Get started with our developer documentation and add evals if you find cases where function calling could be improved

This would make it pretty easy to host tools in ChatCraft that can be easily invoked by the model.

humphd commented 1 year ago

Developer docs for Function Calling are here.

humphd commented 1 year ago

OpenAI Cookbook Example of Function Calling here

humphd commented 1 year ago

https://twitter.com/nfcampos/status/1671212156711325696 has an example of using raw OpenAI functions with langchain.js

tarasglek commented 1 year ago

Note openai functions add a function message type.

https://platform.openai.com/docs/api-reference/chat/create

tarasglek commented 1 year ago

Langchain.js function call stuff: https://github.com/hwchase17/langchainjs/blob/53b6cb4fd9365e26b21c5e40ed2ddd8077562fdb/langchain/src/chat_models/openai.ts#L87

tarasglek commented 1 year ago

To me tools are user-defined functions(interconnection function, formatting functions) and openai ability to call em, which minimal intervention from me to feed data back into gpt (and ability to choose not to feed it back)...This combined with our existing code-editing, retry-with-different model should be pretty powerful

So I think there is only one way to do this.

1) Add a second "system message" where we can edit schema for openai functions. These will likely be catalog of built-in functions showCode(code, language), someDBOperations and user-defined functions like "runSQL". Flag to force functions to run, "just show me the code" becomes a forced "renderCode" call

2) When model tells us to run code by sending us a reply with "assistant" messages with "content":null and "function_call", we render function call similar to how we do now...have per-function-call auto-run(eg renderCode is harmless)

3) if auto-run is specified, we immediately pipe results back..if not, we have option of running the function and sending results back optionally(this is useful when using model to generate sql queries and no further interaction is needed), same thing happens if we have a function to apply cloudformation, etc

that covers all my immediate tool needs. Gets me bash, sql, etc consoles. Maybe even a code writing agent that can modify code in a repository that i expose via some api.

humphd commented 1 year ago

Let me think out-loud, based on what you wrote above.

First, a tool or function (I think "function" is probably closer to my concept of what this is) has the following:

  1. name
  2. description
  3. input arg schema (JSON Schema)
  4. implementation in JS

A bunch of this is based on https://github.com/openai/openai-node/blob/dc821be3018c832650e21285bade265099f99efb/api.ts#L26-L50 and https://platform.openai.com/docs/api-reference/chat/create#chat/create-functions, but we can also colocate the implementation, since we'll run these functions in the browser.

Here's an example:

{
  name: "echo",
  description: "echoes the input back",
  parameters: {
    type: "object",
    properties: {
      value: {
        type: "string",
        description: "The value to echo back",
      },
      required: ["value"]
    }
  },
  code: "function(value) {\n return value;\n }"
}

We can put these functions in a new table in the database. This lets us re-use them across different chats, and refer to them by id or something within a chat vs. having them live there. A shared chat could embed them so you can pass them around.

In terms of the UI for working with them, we could either do it all in the current message view, or we could introduce a new view. I think I'd do the latter. For example:

This isn't the right UI, but imagine having a way to easily add/remove functions by name in the system prompt message, and also to go write a new one:

Screenshot 2023-07-11 at 11 03 32 AM
tarasglek commented 1 year ago
  1. name
  2. description
  3. input arg schema (JSON Schema)
  4. implementation in JS

I slept on this. I think a tool is gonna be something like JS module + typescript type annotations for it. We can use typescript tooling to convert mapping into json-schema that openai requires

So a tool would be a http://-loadable JS module with some metadata describing what's in it. While at it, we can also include a system prompt in that metadata bundle.

humphd commented 1 year ago

Thinking about this as an ESM module is kind of interesting. To take my previous example and rewrite it you might have this:

export const name = "echo";

export const description = "echoes the input back";

export const parameters = {
  type: "object",
  properties: {
    value: {
      type: "string",
      description: "The value to echo back",
    },
    required: ["value"],
  },
};

export default function (value: string) {
  return value;
}

Loading that out of the db is pretty easy to do, since we can make a Blob URL. I was thinking that the router can be extended to add a /f/:function-id route (similar to our /c/:chat-id).

You'd also be able to load shared tools remotely that way: chatcraft.org/f/:user/:function-id.

I don't think you'd want a system prompt in a function, since you're likely to need/want more than 1 function in a chat.

tarasglek commented 1 year ago

I was thinking we could pre-process modules written more naturally

/**
echoes the input back
@param value The value to echo back
*/
export function echo(value: string) {
  return value;
}

I would like to be able to include multiple functions per module this way...but your way is much easier to implement

We could combine system prompts when multiple modules are used

humphd commented 1 year ago

I think making single-function modules is more in the spirit of the APIs, where you provide an array of functions to use. I agree that you'll want multiple functions in lots of cases, but being able to compose them together would be more flexible. I also think that separating functions from one another, and also from the chat/messages, makes them more reusable.

In the UI we could have a way to get GPT to generate the structured bits (name, params, etc) based on the code you write. However, we could provide a way to enter them manually.

So can we get practical?

  1. What's our naming here, is it "Tool" or "Function"?
  2. Add db table to store them (metadata + blob for code)
  3. Add routing to be able to access them and load dynamically via URL
  4. Some initial UI for writing/editing

Once those are in place, we can work on integration UI into current chat, system prompt, etc.

tarasglek commented 1 year ago

I'm ok with single-function, but i think prompt should live with function on when to use it.

  1. To me Tool is combination of interaction functions(eg fetch call to clickhouse) and formatting functions(eg render as graph or markdown) and system prompt to direct chatcraft to use those functions
  2. db is a followup to me.. if we can load module payloads, we list em in a blog for now..somewhere outside of chatcraft
  3. same opinion as ^
  4. again, i'd be ok with editing the tool outside of chatcraft for MVP
humphd commented 1 year ago

So are you imagining this more like an "App Store" where these functions are vetted and stored centrally somewhere outside of ChatCraft? I was imagining us doing this without a central store, but I can see the value of both ways. This is why figuring out the db/routing/loading first was important to me, since we need them to have a home in your existing data.

tarasglek commented 1 year ago

I'm imagining it as programming lego. toolbox/appstore is nice but not required to use tools.

humphd commented 1 year ago

I'm going to continue to push to have this become more concrete so I can start coding.

"Programming Lego" is nice, and to me it implies the idea of individual, atomic, building blocks. So my proposal is:

  1. Functions are code + metadata we store locally in the db (lego). They are ES Modules that we can load on demand at runtime from db via Blob URL and invoke (i.e., they live outside the app bundle).
  2. System Prompts can include one or more functions (e.g., by name), but they aren't "stored" there. Including functions in a prompt means the LLM may ask us to call them in the response.
  3. A Tool is the combo of a system prompt + functions I can use to do something, so essentially: a chat with custom system prompt and included functions.

I think the above would be enough to build a prototype to try, and then we can figure out editing, sharing, etc.

Sound good? What did I get wrong here?

tarasglek commented 1 year ago

2,3 are good. But confused as to what "store locally in the db" means

humphd commented 1 year ago

I mean we can store the actual function code in the db as either a string or Blob and then when we want to use it at runtime, we can.

// Get a function from the db
const fn = await db.functions.get(id);
// Grab it's "code" which is a blob of the JS ES Module with type text/javascript
const blob = fn.code;
// Use that Blob to create a dynamic URL
const url = URL.createObjectURL(blob);
// Load the module at runtime and grab the function we want to use
const { myFunction } = await import(url);
// Call the function
await myFunction(inputData);
tarasglek commented 1 year ago

We can do that. I think I would eventually want to import external files, but that's a bigger feature.

tarasglek commented 1 year ago

I'm sold on your proposal

humphd commented 1 year ago

Another question before I start implementing this.

In the UI for creating/editing a function, I need to know how to handle metadata for a function. What I'm imagining is that you can go to chatcraft.org/f/new and create a function, or chatcraft.org/f/:id or maybe just use the function name as id, and edit an existing. I'm trying to decide how to do the layout for what you see when you get there.

I think we have two options:

  1. A bunch of form inputs (function name, description, etc) followed by an editor view for the code.
  2. Just an editor view, and all metadata is somehow extracted from the code.

Doing 1. is obviously easier from an implementation point of view. Is there value in trying to do everything via code? Without needing complex parsing, we could use what I suggested above and put everything into exports on the module:

export const name = "echo";

export const description = "echoes the input back";

export const parameters = {
  type: "object",
  properties: {
    value: {
      type: "string",
      description: "The value to echo back",
    },
    required: ["value"],
  },
};

export default function (value: string) {
  return value;
}

What's nice about this is that we can validate the module to make sure it's working before we save to db. When we save to db, we can import() the module and extract the metadata to put into other fields.

Which way should I go?

tarasglek commented 1 year ago

I vote for 2.

humphd commented 1 year ago

Started playing with this, and it's going to work:

Screenshot 2023-07-16 at 2 03 41 PM

In the screenshot above, you can see a code editor, and I'm able to parse the module, dynamically import it, and extract the features I want (notice the name and description being used dynamically in the header).

Code to do the extraction is this:

const parseModule = async (code: string) => {
  const blob = new Blob([code], { type: "text/javascript" });
  const url = URL.createObjectURL(blob);

  try {
    return import(/* @vite-ignore */ url);
  } catch (err: any) {
    console.warn("Unable to parse module", err);
    throw new Error(`Unable to parse module: ${err.message}`);
  } finally {
    URL.revokeObjectURL(url);
  }
};

I'll keep playing and post a PR this week.

tarasglek commented 1 year ago

Microsoft released a library to do something similar to what we are doing here https://github.com/microsoft/TypeChat/tree/4d34a5005c67bc49444e6e6d016a9262cf24b38d

humphd commented 1 year ago

This is done!