withcatai / node-llama-cpp

Run AI models locally on your machine with node.js bindings for llama.cpp. Force a JSON schema on the model output on the generation level
https://withcatai.github.io/node-llama-cpp/
MIT License
736 stars 63 forks source link

Response streaming in 3.0.0 beta version #213

Closed Reyons227 closed 2 months ago

Reyons227 commented 2 months ago

Issue description

Response Streaming not working in 3.0.0 beta, Error : decode not found

Expected Behavior

Response Streaming doesnt work as expected as its features cannot be found in the lib in beta and i cannot find a guide on how to stream response in 13.0.0 beta as the same from stable version doesnt work as expected.

Actual Behavior

Response Streaming should work as before in 2.8.10

Steps to reproduce

use the same Response streaming code in 2.8.10 then upgrade to beta and the same code doesnt work

import {fileURLToPath} from "url";
import path from "path";
import {
    LlamaModel, LlamaContext, LlamaChatSession, Token
} from "node-llama-cpp";

const __dirname = path.dirname(fileURLToPath(import.meta.url));

const model = new LlamaModel({
    modelPath: path.join(__dirname, "models", "codellama-13b.Q3_K_M.gguf")
});
const context = new LlamaContext({model});
const session = new LlamaChatSession({context});

const q1 = "Hi there, how are you?";
console.log("User: " + q1);

process.stdout.write("AI: ");
const a1 = await session.prompt(q1, {
    onToken(chunk: Token[]) {
        process.stdout.write(context.decode(chunk));
    }
});

My Environment

Dependency Version
Operating System Windows 10
CPU ryzen 5 7600
Node.js version v20.9.0
Typescript version latest
node-llama-cpp version 3.0.0

Additional Context

No response

Relevant Features Used

Are you willing to resolve this issue by submitting a Pull Request?

No, I don’t have the time and I’m okay to wait for the community / maintainers to resolve this issue.

iimez commented 2 months ago

Should probably be model.detokenize(tokens) now.

giladgd commented 2 months ago

I've updated the version 3 beta PR to include an example of how to stream a response