`(node:23396) UnhandledPromiseRejectionWarning: Error: AbortError`

Issue description

When aborting a prompt() an error is thrown.

Expected Behavior

I can abort a generation.

Actual Behavior

This is thrown:

(node:23396) UnhandledPromiseRejectionWarning: Error: AbortError
    at LlamaChatSession._evalTokens (file:///Users/parismorgan/repo/foo/node_modules/.pnpm/node-llama-cpp@2.8.0/node_modules/node-llama-cpp/dist/llamaEvaluator/LlamaChatSession.js:154:23)
    at async file:///Users/parismorgan/repo/foo/node_modules/.pnpm/node-llama-cpp@2.8.0/node_modules/node-llama-cpp/dist/llamaEvaluator/LlamaChatSession.js:106:72
    at async withLock (file:///Users/parismorgan/repo/foo/node_modules/.pnpm/node-llama-cpp@2.8.0/node_modules/node-llama-cpp/dist/utils/withLock.js:11:16)
    at async LlamaChatSession.promptWithMeta (file:///Users/parismorgan/repo/foo/node_modules/.pnpm/node-llama-cpp@2.8.0/node_modules/node-llama-cpp/dist/llamaEvaluator/LlamaChatSession.js:74:16)
    at async LlamaChatSession.prompt (file:///Users/parismorgan/repo/foo/node_modules/.pnpm/node-llama-cpp@2.8.0/node_modules/node-llama-cpp/dist/llamaEvaluator/LlamaChatSession.js:62:26)
(Use `Electron --trace-warnings ...` to show where the warning was created)
(node:23396) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
(node:23396) UnhandledPromiseRejectionWarning: Error: AbortError
    at LlamaChatSession._evalTokens (file:///Users/parismorgan/repo/foo/node_modules/.pnpm/node-llama-cpp@2.8.0/node_modules/node-llama-cpp/dist/llamaEvaluator/LlamaChatSession.js:154:23)
    at async file:///Users/parismorgan/repo/foo/node_modules/.pnpm/node-llama-cpp@2.8.0/node_modules/node-llama-cpp/dist/llamaEvaluator/LlamaChatSession.js:106:72
    at async withLock (file:///Users/parismorgan/repo/foo/node_modules/.pnpm/node-llama-cpp@2.8.0/node_modules/node-llama-cpp/dist/utils/withLock.js:11:16)
    at async LlamaChatSession.promptWithMeta (file:///Users/parismorgan/repo/foo/node_modules/.pnpm/node-llama-cpp@2.8.0/node_modules/node-llama-cpp/dist/llamaEvaluator/LlamaChatSession.js:74:16)
(node:23396) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 2)
    at async LlamaChatSession.prompt (file:///Users/parismorgan/repo/foo/node_modules/.pnpm/node-llama-cpp@2.8.0/node_modules/node-llama-cpp/dist/llamaEvaluator/LlamaChatSession.js:62:26)

which when I click in leads to this line:

for await (const chunk of evaluationIterator) {
            if (signal?.aborted)
                throw new AbortError();

and before that, this line:

const { text, stopReason, stopString, stopStringSuffix } = await this._evalTokens(this._ctx.encode(promptText), {
                onToken, signal, maxTokens, temperature, topK, topP, grammar, trimWhitespaceSuffix,
                repeatPenalty: repeatPenalty == false ? { lastTokens: 0 } : repeatPenalty
            });

Steps to reproduce

I first set up things like this:

const model = new LlamaModel({
  modelPath: '~/repo/gguf-models/llama-2-7b-chat.Q4_K_M.gguf',
  useMlock: true
})
const context = new LlamaContext({
  model,
  batchSize: 512,
  threads: 8,
  contextSize: 4096
})
session = new LlamaChatSession({
  context,
  systemPrompt: "This is a transcript of a never ending conversation between Paris and Siri. This is the personality of Siri: Siri is a knowledgeable and friendly AI. They are very curious and will ask the user a lot of questions about themselves and their life.\nSiri is a virtual assistant who lives on Paris's computer.",
  printLLamaSystemInfo: true,
  promptWrapper: new LlamaChatPromptWrapper()
})

Then I kick off a text generation:

const abortController = new AbortController()
ipcMain.on('message', async (_event, {message}) => {
  ipcMain.on('message', async (_event, {message}) => {
    await session.prompt(message, {
      onToken,
      signal: abortController.signal
    })
})

And then I abort it:

ipcMain.on('stop-generation', () => {
  abortController.abort()
})

My Environment

Dependency	Version
Operating System	Mac
CPU	Intel i9 / Apple M2 Pro
Node.js version	v18.16.0
Typescript version	5.1.6
`node-llama-cpp` version	2.8.0
Electron version	25.3.0

Additional Context

Is the appropriate thing to do here just to wrap the code in a try catch? Or is this actually a bug? Thank you for any help!

Relevant Features Used

[X] Metal support
[ ] CUDA support
[ ] Grammar

Are you willing to resolve this issue by submitting a Pull Request?

Yes, I have the time, but I don't know how to start. I would need guidance.

withcatai / node-llama-cpp

`(node:23396) UnhandledPromiseRejectionWarning: Error: AbortError` #95