Could not find a KV slot

Zambonilli commented 6 months ago

Issue description

In version 2.8.3 I am receiving "could not find a KV slot for the batch (try reducing the size of the batch or increase the context)" error when performing a bunch of zero-shot prompts in a loop with the same context.

Expected Behavior

To be able to perform zero-shot prompts in a loop without failure.

Actual Behavior

I'm receiving the following error after a few iterations regardless of the number of layers or if I create multiple contexts.

Steps to reproduce

follow instructions on my OSS project for setting up genai-gamelist
update package.json to use 2.8.3 from 3.x beta for node-llama-cpp
run the script

My Environment

Dependency	Version
Operating System	Ubuntu 22.04 LTS
CPU	AMD Ryzen 9 5900X
Node.js version	18.x
Typescript version	5.3.3
`node-llama-cpp` version	2.8.3

Additional Context

With only upgrading to the 3.x beta, I am now unable to reproduce the issue with the same script.

Relevant Features Used

[ ] Metal support
[X] CUDA support
[ ] Grammar

Are you willing to resolve this issue by submitting a Pull Request?

Yes, I have the time, but I don't know how to start. I would need guidance.

giladgd commented 6 months ago

@Zambonilli I originally thought you encountered this issue in the version 3 beta. I'm aware of this issue in version 2.x, and I can confirm that version 3 is supposed to fix it. It's a good sign that you don't encounter it anymore when using version 3.

If you encounter it on the version 3 beta then let me know so I can investigate it.

sagarjgborg commented 6 months ago

The error still persist in node-llama-cpp@3.0.0-beta.1

[Error: could not find a KV slot for the batch (try reducing the size of the batch or increase the context)]

giladgd commented 6 months ago

@sagarjgborg Please provide a code to reproduce this issue on the version 3 beta with a link to the model file to use for the code

sagarjgborg commented 6 months ago

Here is my repo https://github.com/sagarjgborg/kv_slot_issue.git . Just do few API calls continuously you will see error.

Zambonilli commented 6 months ago

I think you're getting this error because you're explicitly setting context size to 1024.

sagarjgborg commented 6 months ago

@Zambonilli even I noticed that, but why is it so. Can I set larger context size for model trained on 4096? for example context size as 8192?

withcatai / node-llama-cpp