withcatai / node-llama-cpp

Run AI models locally on your machine with node.js bindings for llama.cpp. Force a JSON schema on the model output on the generation level
https://node-llama-cpp.withcat.ai
MIT License
837 stars 82 forks source link

Llama2 Template Error #41

Closed saul-jb closed 1 year ago

saul-jb commented 1 year ago

Issue description

The Llama2 Templates appear to not work with llama2 models.

Expected Behavior

Using the LlamaChatPromptWrapper I would expect the model to produce a normal response.

Actual Behavior

When I use LlamaChatPromptWrapper it seems to get stuck and produce the following output:

GGML_ASSERT: node_modules/node-llama-cpp/llama/llama.cpp/ggml.c:4785: view_src == NULL || data_size + view_offs <= ggml_nbytes(view_src)

I suspect this is a result of it not understanding the template/stop tokens.

Steps to reproduce

Use the 7B model: https://huggingface.co/TheBloke/Llama-2-7B-GGUF

Run the following code:

import { LlamaModel, LlamaContext, LlamaChatSession, LlamaChatPromptWrapper } from "node-llama-cpp";

const modelPath = "llama-2-7b.Q4_K_M.gguf";

const model = new LlamaModel({ modelPath, gpuLayers: 64 });
const context = new LlamaContext({ model });
const session = new LlamaChatSession({ context, promptWrapper: new LlamaChatPromptWrapper() });

const q1 = "What is a llama?";
console.log("User: " + q1);

const a1 = await session.prompt(q1);
console.log("AI: " + a1);

My Environment

Dependency Version
Operating System Ubuntu 22.04
Node.js version 19.1.0
Typescript version 4.8.4
node-llama-cpp version 2.4.0

Additional Context

The GeneralChatPromptWrapper seems to work normally with the exception of adding "\n\n### :" to the stop tokens. Why does the general prompt wrapper work whereas the llama specific one doesn't? Is this an issue with the model file itself, e.g. bad conversion? Is there a better way to debug this?

Related: https://huggingface.co/TheBloke/Llama-2-7B-GGUF/discussions/1

Relevant Features Used

Are you willing to resolve this issue by submitting a Pull Request?

Yes, I have the time, but I don't know how to start. I would need guidance.

giladgd commented 1 year ago

The output you mentioned here seems to come out of the llama.cpp code itself and not from the model, so I don't think the wrapper is related to the issue you are facing.

I suggest you try using a llama chat version of llama2 model instead, like this one for example: https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF

From my own experience, GeneralChatPromptWrapper works better for most models.

You can also try setting a custom systemPrompt parameter on a LlamaChatSession.

If none of these work, try downloading a newer release of llama.cpp and compiling it from source using this command:

node-llama-cpp download --release latest
giladgd commented 1 year ago

I'm closing this issue for now as it seems unrelated to node-llama-cpp.

If you continue to face this issue, I suggest you open an issue on llama.cpp itself.