feat: automatic batching

giladgd commented 9 months ago

Description of change

feat: evaluate multiple sequences in parallel with automatic batching
feat: improve automatic chat wrapper resolution
feat: smart context shifting
feat: improve TS types
refactor: improve API
build: support beta releases
build: improve dev configurations

BREAKING CHANGE: completely new API (docs will be updated before a stable version is released)

Closes #85 Fixes #102 Fixes #94 Fixes #93 Fixes #76

Things left to do (in other PRs)

Update documentation
Use the smart context shifting support in LlamaChatSession
Add contexts manager to automatically create more contexts as needed
Improve grammar support
Try to disable llama.cpp logs by default
Add migration guide from v2 to v3
Add more tests

Pull-Request Checklist

[x] Code is up-to-date with the master branch
[x] npm run format to apply eslint formatting
[x] npm run test passes with this change
[x] This pull request links relevant issues as Fixes #0000
[ ] There are new or updated unit tests validating the change
[ ] Documentation has been updated to reflect this change
[x] The new commits and pull request title follow conventions explained in pull request guidelines (PRs that do not follow this convention will not be merged)

github-actions[bot] commented 9 months ago

:tada: This PR is included in version 3.0.0-beta.1 :tada:

The release is available on:

Your semantic-release bot :package::rocket:

Madd0g commented 4 months ago

is there a code snippet that shows how to correctly use batching? I'm doing repetitive things in a loop and wondering how I might take advantage of this?

giladgd commented 4 months ago

@Madd0g There will be a better example in the documentation when version 3 leaves the beta status soon, but for now, here's a simple example:

import {fileURLToPath} from "url";
import path from "path";
import {getLlama, LlamaChatSession} from "node-llama-cpp";

const __dirname = path.dirname(fileURLToPath(import.meta.url));

const llama = await getLlama();
const model = await llama.loadModel({
    modelPath: path.join(__dirname, "models", "dolphin-2.1-mistral-7b.Q4_K_M.gguf")
});
const context = await model.createContext({
    sequences: 2
});

const sequence1 = context.getSequence();
const sequence2 = context.getSequence();

const session1 = new LlamaChatSession({
    contextSequence: sequence1
});
const session2 = new LlamaChatSession({
    contextSequence: sequence2
});

const q1 = "Hi there, how are you?";
const q2 = "How much is 6+6?";

const [
    a1,
    a2
] = await Promise.all([
    session1.prompt(q1),
    session2.prompt(q2)
]);

console.log("User: " + q1);
console.log("AI: " + a1);

console.log("User: " + q2);
console.log("AI: " + a2);

The batching is done automatically across sequences of the same context

withcatai / node-llama-cpp