withcatai / node-llama-cpp

Run AI models locally on your machine with node.js bindings for llama.cpp. Force a JSON schema on the model output on the generation level
https://withcatai.github.io/node-llama-cpp/
MIT License
760 stars 65 forks source link

feat: completion and infill #164

Closed giladgd closed 5 months ago

giladgd commented 5 months ago

Description of change

Infill, also known as fill-in-middle, is used to generate a completion for an input that should connect to a given continuation. For example, for a prefix input 123 and suffix input 789, the model is expected to generate 456 to make the final text be 123456789.

Not every model supports infill, so only those that do can be used for generating an infill.

How to generate a completion

import {fileURLToPath} from "url";
import path from "path";
import {getLlama, LlamaModel, LlamaContext, LlamaCompletion} from "node-llama-cpp";

const __dirname = path.dirname(fileURLToPath(import.meta.url));

const llama = await getLlama();
const model = new LlamaModel({
    llama,
    modelPath: path.join(__dirname, "models", "stable-code-3b.Q5_K_M.gguf")
});
const context = new LlamaContext({
    model,
    contextSize: Math.min(4096, model.trainContextSize)
});
const completion = new LlamaCompletion({
    contextSequence: context.getSequence()
});

const input = "const arrayFromOneToTwenty = [1, 2, 3,";
console.log("Input: " + input);

const res = await completion.generateCompletion(input);
console.log("Completion: " + res);

In this example I used this model

How to generate an infill

import {fileURLToPath} from "url";
import path from "path";
import {getLlama, LlamaModel, LlamaContext, LlamaCompletion, UnsupportedError} from "node-llama-cpp";

const __dirname = path.dirname(fileURLToPath(import.meta.url));

const llama = await getLlama();
const model = new LlamaModel({
    llama,
    modelPath: path.join(__dirname, "models", "stable-code-3b.Q5_K_M.gguf")
});
const context = new LlamaContext({
    model,
    contextSize: Math.min(4096, model.trainContextSize)
});
const completion = new LlamaCompletion({
    contextSequence: context.getSequence()
});

if (!completion.infillSupported)
    throw new UnsupportedError("Infill completions are not supported by this model");

const prefix = "const arrayFromOneToFourteen = [1, 2, 3, ";
const suffix = "10, 11, 12, 13, 14];";
console.log("prefix: " + prefix);
console.log("suffix: " + suffix);

const res = await completion.generateInfillCompletion(prefix, suffix);
console.log("Infill: " + res);

In this example I used this model

Pull-Request Checklist

github-actions[bot] commented 5 months ago

:tada: This PR is included in version 3.0.0-beta.11 :tada:

The release is available on:

Your semantic-release bot :package::rocket: