I have a webserver which is using node-llama-cpp under the hood. When I give short inputs, everything is fine, but if I enter too many tokens, llama.cpp errors with "n_tokens <= n_batch". The problem is not that it errors... the problem is that it effectively calls process.exit() and kills my entire webserver (next.js), instead of throwing an exception which I could catch.
Expected Behavior
When an error occurs (such as input exceeding the context length), it would be much better to throw an Error, rather than force exiting the node runtime. User input should never be able to terminate my server.
And then the node process exits abruptly. It is not catch-able.
Steps to reproduce
Use the basic node-llama-cpp usage example: create LlamaModel, create LlamaContext, create LlamaChatSession, call session.prompt with a long input (more than 512 tokens by default).
My Environment
Dependency
Version
Operating System
Mac
CPU
Apple M3
Node.js version
20.9.0
Typescript version
5.2.2
node-llama-cpp version
2.8.3
Additional Context
Note: I can increase the batch/context size, that's not the issue. The issue is the node process exiting rather than throwing.
I tried to trace what was happening, and add additional try/catch statements, but it appears to be happening inside the native addon.cpp implementation of eval. That's where I got stuck, open to suggestions.
Relevant Features Used
[ ] Metal support
[ ] CUDA support
[ ] Grammar
Are you willing to resolve this issue by submitting a Pull Request?
@platypii This is a known bug in version 2.x. You can either set your batchSize to the same size as your contextSize or switch to use the version 3 beta that fixes this issue
Issue description
I have a webserver which is using node-llama-cpp under the hood. When I give short inputs, everything is fine, but if I enter too many tokens, llama.cpp errors with "n_tokens <= n_batch". The problem is not that it errors... the problem is that it effectively calls process.exit() and kills my entire webserver (next.js), instead of throwing an exception which I could catch.
Expected Behavior
When an error occurs (such as input exceeding the context length), it would be much better to throw an Error, rather than force exiting the node runtime. User input should never be able to terminate my server.
Actual Behavior
And then the node process exits abruptly. It is not catch-able.
Steps to reproduce
Use the basic node-llama-cpp usage example: create LlamaModel, create LlamaContext, create LlamaChatSession, call
session.prompt
with a long input (more than 512 tokens by default).My Environment
node-llama-cpp
versionAdditional Context
Note: I can increase the batch/context size, that's not the issue. The issue is the node process exiting rather than throwing.
I tried to trace what was happening, and add additional try/catch statements, but it appears to be happening inside the native addon.cpp implementation of
eval
. That's where I got stuck, open to suggestions.Relevant Features Used
Are you willing to resolve this issue by submitting a Pull Request?
Yes, I have the time, and I know how to start.