Response streaming in 3.0.0 beta version

Reyons227 commented 2 months ago

Issue description

Response Streaming not working in 3.0.0 beta, Error : decode not found

Expected Behavior

Response Streaming doesnt work as expected as its features cannot be found in the lib in beta and i cannot find a guide on how to stream response in 13.0.0 beta as the same from stable version doesnt work as expected.

Actual Behavior

Response Streaming should work as before in 2.8.10

Steps to reproduce

use the same Response streaming code in 2.8.10 then upgrade to beta and the same code doesnt work

import {fileURLToPath} from "url";
import path from "path";
import {
    LlamaModel, LlamaContext, LlamaChatSession, Token
} from "node-llama-cpp";

const __dirname = path.dirname(fileURLToPath(import.meta.url));

const model = new LlamaModel({
    modelPath: path.join(__dirname, "models", "codellama-13b.Q3_K_M.gguf")
});
const context = new LlamaContext({model});
const session = new LlamaChatSession({context});

const q1 = "Hi there, how are you?";
console.log("User: " + q1);

process.stdout.write("AI: ");
const a1 = await session.prompt(q1, {
    onToken(chunk: Token[]) {
        process.stdout.write(context.decode(chunk));
    }
});

My Environment

Dependency	Version
Operating System	Windows 10
CPU	ryzen 5 7600
Node.js version	v20.9.0
Typescript version	latest
`node-llama-cpp` version	3.0.0

Additional Context

No response

Relevant Features Used

[ ] Metal support
[ ] CUDA support
[ ] Grammar

Are you willing to resolve this issue by submitting a Pull Request?

No, I don’t have the time and I’m okay to wait for the community / maintainers to resolve this issue.

iimez commented 2 months ago

Should probably be model.detokenize(tokens) now.

giladgd commented 2 months ago

I've updated the version 3 beta PR to include an example of how to stream a response

withcatai / node-llama-cpp