Closed ruochenjia closed 4 months ago
@ruochenjia I found the issue and included the fix in #175
:tada: This issue has been resolved in version 3.0.0-beta.13 :tada:
The release is available on:
v3.0.0-beta.13
Your semantic-release bot :package::rocket:
Issue description
EOF token not detected for some models
Expected Behavior
model.tokens.eos
should be a non-null value after loading the model, and thesequence.evaluate
call should stop (exitfor await
loop) without any additionalbreak
statements when the generation is completed.Actual Behavior
The generating process continues with repeated or random non-related contents, and the EOS token is printed in the generated text as
<dummy32000>
.Currently you have to manually check from the EOS token in the loop in order to stop generating, and
model.tokens.eos
is alwaysnull
.Steps to reproduce
mistral-7b-openorca.gguf2.Q4_0.gguf
model downloaded from gpt4all website.const model = new LlamaModel({ llama: await getLlama({ cuda: true, build: "auto" }), useMmap: false, useMlock: false, modelPath: "./local/mistral-7b-openorca.Q4_0.gguf", gpuLayers: 32, });
const context = new LlamaContext({ model: model, seed: 0, threads: 4, sequences: 1, batchSize: 128, contextSize: 2048, });
const sequence = context.getSequence(); await sequence.clearHistory();
let response = "";
for await (const token of sequence.evaluate(model.tokenize(message, true), { topK: 40, topP: 0.4, temperature: 0.8, evaluationPriority: 5, })) { const text = model.detokenize([token]); // if ((response += text).indexOf("") > 0)
// break;
}