Closed humphd closed 1 year ago
OpenAI announced changes to how models interact with functions:
Function calling
Developers can now describe functions to gpt-4-0613 and gpt-3.5-turbo-0613, and have the model intelligently choose to output a JSON object containing arguments to call those functions. This is a new way to more reliably connect GPT's capabilities with external tools and APIs.
These models have been fine-tuned to both detect when a function needs to be called (depending on the user’s input) and to respond with JSON that adheres to the function signature. Function calling allows developers to more reliably get structured data back from the model. For example, developers can:
Create chatbots that answer questions by calling external tools (e.g., like ChatGPT Plugins)
Convert queries such as “Email Anya to see if she wants to get coffee next Friday” to a function call like send_email(to: string, body: string), or “What’s the weather like in Boston?” to get_current_weather(location: string, unit: 'celsius' | 'fahrenheit').
Convert natural language into API calls or database queries
Convert “Who are my top ten customers this month?” to an internal API call such as get_customers_by_revenue(start_date: string, end_date: string, limit: int), or “How many orders did Acme, Inc. place last month?” to a SQL query using sql_query(query: string).
Extract structured data from text
Define a function called extract_people_data(people: [{name: string, birthday: string, location: string}]), to extract all people mentioned in a Wikipedia article.
These use cases are enabled by new API parameters in our /v1/chat/completions endpoint, functions and function_call, that allow developers to describe functions to the model via JSON Schema, and optionally ask it to call a specific function. Get started with our developer documentation and add evals if you find cases where function calling could be improved
This would make it pretty easy to host tools in ChatCraft that can be easily invoked by the model.
https://twitter.com/nfcampos/status/1671212156711325696 has an example of using raw OpenAI functions with langchain.js
Note openai functions add a function message type.
Langchain.js function call stuff: https://github.com/hwchase17/langchainjs/blob/53b6cb4fd9365e26b21c5e40ed2ddd8077562fdb/langchain/src/chat_models/openai.ts#L87
To me tools are user-defined functions(interconnection function, formatting functions) and openai ability to call em, which minimal intervention from me to feed data back into gpt (and ability to choose not to feed it back)...This combined with our existing code-editing, retry-with-different model should be pretty powerful
So I think there is only one way to do this.
1) Add a second "system message" where we can edit schema for openai functions. These will likely be catalog of built-in functions showCode(code, language), someDBOperations and user-defined functions like "runSQL". Flag to force functions to run, "just show me the code" becomes a forced "renderCode" call
2) When model tells us to run code by sending us a reply with "assistant" messages with "content":null and "function_call", we render function call similar to how we do now...have per-function-call auto-run(eg renderCode is harmless)
3) if auto-run is specified, we immediately pipe results back..if not, we have option of running the function and sending results back optionally(this is useful when using model to generate sql queries and no further interaction is needed), same thing happens if we have a function to apply cloudformation, etc
that covers all my immediate tool needs. Gets me bash, sql, etc consoles. Maybe even a code writing agent that can modify code in a repository that i expose via some api.
Let me think out-loud, based on what you wrote above.
First, a tool or function (I think "function" is probably closer to my concept of what this is) has the following:
A bunch of this is based on https://github.com/openai/openai-node/blob/dc821be3018c832650e21285bade265099f99efb/api.ts#L26-L50 and https://platform.openai.com/docs/api-reference/chat/create#chat/create-functions, but we can also colocate the implementation, since we'll run these functions in the browser.
Here's an example:
{
name: "echo",
description: "echoes the input back",
parameters: {
type: "object",
properties: {
value: {
type: "string",
description: "The value to echo back",
},
required: ["value"]
}
},
code: "function(value) {\n return value;\n }"
}
We can put these functions in a new table in the database. This lets us re-use them across different chats, and refer to them by id
or something within a chat vs. having them live there. A shared chat could embed them so you can pass them around.
In terms of the UI for working with them, we could either do it all in the current message view, or we could introduce a new view. I think I'd do the latter. For example:
This isn't the right UI, but imagine having a way to easily add/remove functions by name in the system prompt message, and also to go write a new one:
- name
- description
- input arg schema (JSON Schema)
- implementation in JS
I slept on this. I think a tool is gonna be something like JS module + typescript type annotations for it. We can use typescript tooling to convert mapping into json-schema that openai requires
So a tool would be a http://-loadable JS module with some metadata describing what's in it. While at it, we can also include a system prompt in that metadata bundle.
Thinking about this as an ESM module is kind of interesting. To take my previous example and rewrite it you might have this:
export const name = "echo";
export const description = "echoes the input back";
export const parameters = {
type: "object",
properties: {
value: {
type: "string",
description: "The value to echo back",
},
required: ["value"],
},
};
export default function (value: string) {
return value;
}
Loading that out of the db is pretty easy to do, since we can make a Blob URL. I was thinking that the router can be extended to add a /f/:function-id
route (similar to our /c/:chat-id
).
You'd also be able to load shared tools remotely that way: chatcraft.org/f/:user/:function-id
.
I don't think you'd want a system prompt in a function, since you're likely to need/want more than 1 function in a chat.
I was thinking we could pre-process modules written more naturally
/**
echoes the input back
@param value The value to echo back
*/
export function echo(value: string) {
return value;
}
I would like to be able to include multiple functions per module this way...but your way is much easier to implement
We could combine system prompts when multiple modules are used
I think making single-function modules is more in the spirit of the APIs, where you provide an array of functions to use. I agree that you'll want multiple functions in lots of cases, but being able to compose them together would be more flexible. I also think that separating functions from one another, and also from the chat/messages, makes them more reusable.
In the UI we could have a way to get GPT to generate the structured bits (name, params, etc) based on the code you write. However, we could provide a way to enter them manually.
So can we get practical?
Once those are in place, we can work on integration UI into current chat, system prompt, etc.
I'm ok with single-function, but i think prompt should live with function on when to use it.
So are you imagining this more like an "App Store" where these functions are vetted and stored centrally somewhere outside of ChatCraft? I was imagining us doing this without a central store, but I can see the value of both ways. This is why figuring out the db/routing/loading first was important to me, since we need them to have a home in your existing data.
I'm imagining it as programming lego. toolbox/appstore is nice but not required to use tools.
I'm going to continue to push to have this become more concrete so I can start coding.
"Programming Lego" is nice, and to me it implies the idea of individual, atomic, building blocks. So my proposal is:
I think the above would be enough to build a prototype to try, and then we can figure out editing, sharing, etc.
Sound good? What did I get wrong here?
2,3 are good. But confused as to what "store locally in the db" means
I mean we can store the actual function code in the db as either a string
or Blob
and then when we want to use it at runtime, we can.
// Get a function from the db
const fn = await db.functions.get(id);
// Grab it's "code" which is a blob of the JS ES Module with type text/javascript
const blob = fn.code;
// Use that Blob to create a dynamic URL
const url = URL.createObjectURL(blob);
// Load the module at runtime and grab the function we want to use
const { myFunction } = await import(url);
// Call the function
await myFunction(inputData);
We can do that. I think I would eventually want to import external files, but that's a bigger feature.
I'm sold on your proposal
Another question before I start implementing this.
In the UI for creating/editing a function, I need to know how to handle metadata for a function. What I'm imagining is that you can go to chatcraft.org/f/new
and create a function, or chatcraft.org/f/:id
or maybe just use the function name as id, and edit an existing. I'm trying to decide how to do the layout for what you see when you get there.
I think we have two options:
Doing 1. is obviously easier from an implementation point of view. Is there value in trying to do everything via code? Without needing complex parsing, we could use what I suggested above and put everything into exports on the module:
export const name = "echo";
export const description = "echoes the input back";
export const parameters = {
type: "object",
properties: {
value: {
type: "string",
description: "The value to echo back",
},
required: ["value"],
},
};
export default function (value: string) {
return value;
}
What's nice about this is that we can validate the module to make sure it's working before we save to db. When we save to db, we can import()
the module and extract the metadata to put into other fields.
Which way should I go?
I vote for 2.
Started playing with this, and it's going to work:
In the screenshot above, you can see a code editor, and I'm able to parse the module, dynamically import it, and extract the features I want (notice the name and description being used dynamically in the header).
Code to do the extraction is this:
const parseModule = async (code: string) => {
const blob = new Blob([code], { type: "text/javascript" });
const url = URL.createObjectURL(blob);
try {
return import(/* @vite-ignore */ url);
} catch (err: any) {
console.warn("Unable to parse module", err);
throw new Error(`Unable to parse module: ${err.message}`);
} finally {
URL.revokeObjectURL(url);
}
};
I'll keep playing and post a PR this week.
Microsoft released a library to do something similar to what we are doing here https://github.com/microsoft/TypeChat/tree/4d34a5005c67bc49444e6e6d016a9262cf24b38d
This is done!
We've spoken a lot about "tools," and the abilities they'd unlock. Now that we have the message types in place, routing, and the db backend, I think we have most of the pieces necessary to start playing around with this.
It likely makes sense to consider this in the context of Langchain, which already gives us access to Agents and Tools. Here, a "tool" means:
So we could begin by providing a way to write/upload/store JS (or TS and transpile, or I guess whatever lang and compile to WASM) which exports a
call(arg: string): Promise<string>
function.We now have
/c/*
for chats. What about/t/*
for tools?How do we use these tools? Do we create a Toolkit for an agent to use, based on all the
/t/*
tools owned by the user? Or do we have some way to insert a Tool as context in a message? Or both?