feat: support Functionary new chat format

Issue description

Function calling using functionary model doesn't work as the model is using a different token than usual.

Expected Behavior

I tried asking the model to use one function I added on functions:

The function

const evalJavaScript = csmf({
    description: "Evaluate a JavaScript code.",
    params: {
        type: "object",
        properties: {
            code: {
                type: "string",
                description: "JavaScript code to evaluate."
            }
        }
    },
    handler(params: any) {
        console.log("[evalJavaScript called]");
        console.log(params);

        try {
            const hrStart = process.hrtime();
            const lastResult = eval(params.code);
            const hrDiff = process.hrtime(hrStart);
            return { error: false, execution_time: `${hrDiff[0] > 0 ? `${hrDiff[0]}s ` : ''}${hrDiff[1] / 1000000}ms`, result: lastResult }
        }
        catch (err) {
            return { error: true, reason: err }
        }
    }
})

chat_functions['evalJavaScript'] = evalJavaScript;

The prompt

Can you try evaluating this javascript code?

Math.round(Math.random() * 100)

The expected behavior would be the model calling the function, which evaluates the given JavaScript code, which returns a random number to the model

Actual Behavior

The model tried to call the function with given parameters, but failed to do so since apparently it tries to call the function in peculiar way.

This is the actual given response from the model:

Sure, I can do that. Let's evaluate the JavaScript code `Math.round(Math.random() * 100)`.>>>evalJavaScript({"code": "Math.round(Math.random() * 100)"})

Steps to reproduce

Using template from npm create --yes node-llama-cpp@beta And using functionary model small v3.2

import { fileURLToPath } from "url";
import path from "path";
import chalk from "chalk";
import { getLlama, LlamaChatSession, ChatSessionModelFunction, defineChatSessionFunction } from "node-llama-cpp";

const __dirname = path.dirname(fileURLToPath(import.meta.url));
const modelsFolderDirectory = path.join(__dirname, "..", "models");
const chat_functions: { [function_name: string]: ChatSessionModelFunction<any> } = {};
const evalJavaScript = defineChatSessionFunction({
    description: "Evaluate a JavaScript code.",
    params: {
        type: "object",
        properties: {
            code: {
                type: "string",
                description: "JavaScript code to evaluate."
            }
        }
    },
    handler(params: any) {
        console.log("[evalJavaScript called]");
        console.log(params);

        try {
            const hrStart = process.hrtime();
            const lastResult = eval(params.code);
            const hrDiff = process.hrtime(hrStart);
            return { error: false, execution_time: `${hrDiff[0] > 0 ? `${hrDiff[0]}s ` : ''}${hrDiff[1] / 1000000}ms`, result: lastResult }
        }
        catch (err) {
            return { error: true, reason: err }
        }
    }
})
chat_functions['evalJavaScript'] = evalJavaScript;

const llama = await getLlama();

console.log(chalk.yellow("Loading model..."));
const model = await llama.loadModel({
    modelPath: path.join(modelsFolderDirectory, "functionary-small-v3.2.F16.gguf")
});

console.log(chalk.yellow("Creating context..."));
const context = await model.createContext();

const session = new LlamaChatSession({
    contextSequence: context.getSequence()
});
console.log();

const q1 = `
Can you try evaluating this javascript code?

Math.round(Math.random() * 100)`.trim();
console.log(chalk.yellow("User: ") + q1);

process.stdout.write(chalk.yellow("AI: "));
const a1 = await session.prompt(q1, {
    functions: chat_functions,
    onTextChunk(chunk) {
        // stream the response to the console as it's being generated
        process.stdout.write(chunk);
    }
});
process.stdout.write("\n");
console.log(chalk.yellow("Consolidated AI answer: ") + a1);
console.log();

process.exit(0);

My Environment

Dependency	Version
Operating System	Windows 10
CPU	Ryzen 3 2200G
GPU	RTX 3090
Node.js version	20.10.0
Typescript version	5.4.5
`node-llama-cpp` version	3.0.0-beta.44

Additional Context

I'm sorry if I was mistaken about this issue, whether this is a bug or my inexperience showing, or whether this is functionary problem or node-llama-cpp problem. I've tried looking at the issues and beta discussion, but none mentioned anything like this so I've had to open this issue.

Relevant Features Used

[ ] Metal support
[X] CUDA support
[ ] Grammar

Are you willing to resolve this issue by submitting a Pull Request?

No, I don’t have the time and I’m okay to wait for the community / maintainers to resolve this issue.

Functionary changes their chat template format too frequently, so I haven't been able to keep up with their pace of change.

From my recent more in-depth tests of their models, I found that they suffer from overfitting issues, where the order of items in the prompt determines the response instead of the meaning of the prompt. For example, asking the model whether a 6$ item is more expensive than a 4$ item gets a response saying it is, while asking whether a 4$ item is more expensive than a 6$ also gets the same response saying that the 4$ item is more expensive - the first item is always more expensive.

Functionary models were useful when function calling wasn't a feature of Llama models, but since Llama 3.1 is out (which supports function calling natively) there's no benefit to using Functionary anymore, since Functionary models are based on Llama anyway.

Because of this, I consider dropping support for Functionary models altogether to avoid having people try them out and be disappointed when they learn that better options exist, like Llama 3.1.

I recommend you to select Llama 3.1 when running npm create --yes node-llama-cpp@beta, and I'll update the documentation to recommend Llama 3.1 instead.

Hi @giladgd, thanks for the reply!

It's true that Llama 3.1 now supports function calling; I've tested it myself, and it works quite well.

However, it struggled with certain functions in specific contexts. As a result, it failed to fill the required parameters correctly and the function returns an error, leading to the model looping; it keeps calling the function with different, but less coherent, parameters in each iteration. This led me to look for a better alternative, which brought me back to the Functionary model.

From what I see on their GitHub repository page, it looks like the latest Functionary Small model performs almost as well as Llama 3.1 70B-Instruct*, so I want to give it a try.

If possible, could you please guide me on how to get this working? Specifically, I'd like to know if there's a way to call the function manually and then return the result to the model manually as well. From what I see, it seems like I can do this by parsing the model's response when it tries to call a function, which as we can see is >>>. I could then separate the model's response from the function call, run the function, and return the result to the model for further processing.

This might result in two replies from the model, but I can work with that or even combine the replies into one (although it might not be pretty). For now, could you tell me how to manually call a function and pass it to the model? Thanks in advance!

*Edit: it looks like the table has changed since the last time I checked, so here's the previous table of their evaluation comparation against other models.

@physimo Can you please share with me a simple code example of a scenario where the model loops infinitely? I might be able to think of a solution to that issue or implement a mitigation for this in node-llama-cpp.

I'll take a look at the new Functionary chat template in the next few days to add support for it. The documentation of version 3 will include a detailed explanation of how to create a custom chat wrapper with function calling (I'm actively working on it), but it's not ready yet.

In the meantime, you can either:

Read the implementation of FunctionaryChatWrapper and modify the function calling settings. You can try to make changes inside of the node_modules/node-llama-cpp folder and see whether it works for you.
Use TemplateChatWrapper or JinjaTemplateChatWrapper and provide the syntax for function calling (it's also supported on JinjaTemplateChatWrapper, not just TemplateChatWrapper, but note that it doesn't support parallel function calling yet)

Update: after a quick look, it seems that the latest Functionary models support Llama 3.1 syntax, so dropping support for a custom Functionary chat wrapper may actually be the right solution to this issue.

To use it right now, you can force a LlamaChatSession to use Llama 3.1 chat wrapper like this:

import {fileURLToPath} from "url";
import path from "path";
import {getLlama, LlamaChatSession, Llama3_1ChatWrapper} from "node-llama-cpp";

const __dirname = path.dirname(fileURLToPath(import.meta.url));

const llama = await getLlama();
const model = await llama.loadModel({
    modelPath: path.join(__dirname, "models", "functionary-small-v3.2.F16.gguf")
});
const context = await model.createContext();
const session = new LlamaChatSession({
    contextSequence: context.getSequence(),
    chatWrapper: new Llama3_1ChatWrapper()
});

const q1 = "Hi there, how are you?";
console.log("User: " + q1);

const a1 = await session.prompt(q1);
console.log("AI: " + a1);

Hi @giladgd ! I'm sorry for the late reply

I've tried using Llama 3.1 chat wrapper with the functionary model, but it seems like it still cannot call the function properly Although I haven't tried the other 2 suggestion you've told, I just want to point this out first to give you a quick response.

For the simple code of where the model loops infinitely, I'll try to make it as simple as possible here:

So I asked my friend to try making the model call a function (the evalJavaScript function) that should only be allowed for me, the function has this 2 parameters:

    description: "Evaluate a JavaScript code. Only, and only intended for Owner usage.",
    params: {
        type: "object",
        properties: {
            requester_username: {
                type: "string",
                description: "Username of requester"
            },
            code: {
                type: "string",
                description: "JavaScript code to evaluate."
            }
        }
    },

Then the function do a quick check to see if the requester username is me, and returns a pseudo error if not:

        if (requester_username != owner_username) {
            return { error: true, reason: "This user is not permitted to use this function." }
        }

At first, the model (Meta Llama 3.1 instruct) managed to properly extract the context and fill in the parameters properly, but after getting the error result it keeps trying to call the function with less coherent parameters...

I managed to make the model call a function by using TemplateChatWrapper:

const chatWrapper = new TemplateChatWrapper({
    template: "{{systemPrompt}}\n{{history}}\nmodel:{{completion}}\nuser:",
    historyTemplate: "{{roleName}}: {{message}}\n",
    modelRoleName: "model",
    userRoleName: "user",
    functionCallMessageTemplate:{
        call: ">>>{{functionName}}({{functionParams}})",
        result: "[[result: {{functionCallResult}}]]"
    }
})

Although as you can see on the image, I think it needs a little bit of fine tuning to make the model return a clean response here. Right now I haven't made any change except the functionCallMessageTemplate, I copy-pasted it straight from your example here, so for now I'll try to make the model returns a clean response.

Thanks for your help @giladgd ! I'll leave closing this issue to you.

withcatai / node-llama-cpp