withcatai / node-llama-cpp

Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema on the model output on the generation level
https://node-llama-cpp.withcat.ai
MIT License
1.04k stars 95 forks source link

Supporting minItems and others in LlamaJsonSchemaGrammar #384

Open TrevorSundberg opened 1 week ago

TrevorSundberg commented 1 week ago

Feature Description

It looks like LlamaJsonSchemaGrammar had been introduced before llama.cpp had an API for it: https://github.com/ggerganov/llama.cpp/blob/74d73dc85cc2057446bf63cc37ff649ae7cebd80/common/json-schema-to-grammar.h#L8

With LlamaJsonSchemaGrammar I ran into minItems in my schema not being used, but after looking into llama.cpp it appears they support minItems: https://github.com/ggerganov/llama.cpp/blob/74d73dc85cc2057446bf63cc37ff649ae7cebd80/common/json-schema-to-grammar.cpp#L972

I was thinking it might be beneficial (but hopefully not breaking) to switch to their version rather than maintain your own.

The Solution

Switch to using json_schema_to_grammar from llama.cpp. If this is a breaking change, it could be exposed as an alternate API, like new llm.LlamaJsonSchemaCppGrammar(llama, schema); and maybe LlamaJsonSchemaCppGrammar derives from LlamaJsonSchemaGrammar or something

Considered Alternatives

Implement minItems in the grammar generator, which would affect generated rules like this:

rule14 ::= ( rule1 ) ( "," whitespace-b-2-4-rule rule1 )*
rule15 ::= ( rule1 )?

Additional Context

No response

Related Features to This Feature Request

Are you willing to resolve this issue by submitting a Pull Request?

No, I don’t have the time, but I can support (using donations) development.

giladgd commented 5 days ago

I created the JSON schema grammar implementation in node-llama-cpp before it was introduced into llama.cpp, but I kept it afterward to provide better vertical support for it throughout the entire library, which includes validation and full TypeScript support. Also, when using function calling, the JSON schema is converted to various formats models expect to get the function definition in, so having a separate implementation allows for enhanced vertical support for each feature of the JSON schema throughout all the relevant components.

As I explained in the documentation, many of the features of JSON schema are not supported on purpose, since they don't align well with the way models generate output, and using them is prone to hallucinations. While it is possible to force the model to follow more JSON schema options (like the llama.cpp implementation offers), I opted not to do that to make people compose their JSON schemas in a way that aligns better with how models generate output to achieve higher-quality generations and not waste time optimizing the wrong things to get it to work.

Regarding the minItems feature, the issue with it is that when you use it to force the model to generate an array with a minimum number of elements, the model isn't aware of this requirement beforehand and thus will not be able to "plan" the entire content of the array in advance, which may lead it to generate inconsistent and unevenly spread items. It can also make the model repeat the existing values in different forms, which can also make the model generate an extensively long array since it attempts to continue the existing pattern that was forced upon it.

A better approach would be to tell the model the requirement in advance (as part of the prompt), or force it to generate a value indicative of the continuation of the output you want it to generate.

Here's an example of how you can do that by prompting the model:

const paragraph = "...";
const grammar = await llama.createGrammarForJsonSchema({
    type: "object",
    properties: {
        personNamesInTheGivenParagraph: {
            type: "array",
            items: {
                type: "string"
            }
        }
    }
});

const res = await chat.prompt([
    `Here is a paragraph: ${paragraph}`,
    "Extract at least 10 person names from the paragraph"
].join("\n\n"), {grammar});

const parsedRes = grammar.parse(res);
console.log(parsedRes.personNamesInTheGivenParagraph);

And here's how you can do that as part of the schema itself. The trick is that you force the model to commit to at least 10 results before it generates the results themselves:

const paragraph = "...";
const grammar = await llama.createGrammarForJsonSchema({
    type: "object",
    properties: {
        minimumPersonNamesInTheGivenParagraph: {
            const: 10
        },
        personNamesInTheGivenParagraph: {
            type: "array",
            items: {
                type: "string"
            }
        }
    }
});

const res = await chat.prompt(`Here is a paragraph: ${paragraph}`, {grammar});

const parsedRes = grammar.parse(res);
console.log(parsedRes.personNamesInTheGivenParagraph);

The minimumPersonNamesInTheGivenParagraph property must be defined before the other properties to force the model to generate it first.

Also, the implementation of the JSON schema conversion in llama.cpp includes support for including schemas from an outside URL. While it does seem pretty neat, this also poses a significant security risk, and even though it's disabled by default, having it in the codebase doesn't give me comfort since any future change in that functionality can have a wider effect on the security of node-llama-cpp. My main concerns regarding that functionality are the making of an HTTP request directly from the native code and the parsing of arbitrary remote text (presumably JSON schema, but not necessarily), which I think are better handled on the JS level for improved security and flexibility.

Let me know whether it helped you or if your particular use case isn't compatible with the above workaround. I can always add support for more JSON schema features, but I'd like to do it in a way that directs people to use it correctly to ensure nobody would have to spend time messing with hallucinations due to grammar issues.

TrevorSundberg commented 5 days ago

Understood on minimumPersonNamesInTheGivenParagraph, as often what I do beforehand is actually give it an example with comments so that the LLM understands the output I'd like, as well as the grammar so that the LLM is constrained to the output I need. However, the issue is when you get the LLM to "commit to at least 10 results ", well... there's no guarantee of that. In my case, I was prompting it and telling it always generate at least one item, but I'd still find cases where smaller models ignore the instruction and generate an empty array.

In order to fix it, I had to write these lines:

  const outputVideoGrammarEx = new llm.LlamaJsonSchemaGrammar(llama, schema as any);
  // HACK, https://github.com/withcatai/node-llama-cpp/issues/384
  const outputVideoGrammar = new llm.LlamaGrammar(llama, {
    grammar: outputVideoGrammarEx.grammar.replace("rule15 ::= ( rule1 )?", "rule15 ::= ( rule1 )"),
  });

I understand that you mentioned hallucinations, however I've had a lot of success using the more advanced features in llama.cpp (it also has a python version of JSON schema > grammar that I used before, including features like minItems). Feeding an example + constrained grammar seems to work well. I don't think the possibility of hallucinations should be a reason to not provide access to the llama.cpp API.

The llama.cpp version itself does not do any HTTP fetching. You pass in a function callback to SchemaConverter that takes in a URL and returns a JSON object. It's up to the caller to implement the HTTP request. In json_schema_to_grammar, it passes in an empty callback that always returns an empty JSON object without an actual HTTP fetch. No security issue there.

Thanks for taking a look into this btw! :)