rustformers / llm

[Unmaintained, see README] An ecosystem of Rust libraries for working with large language models
https://docs.rs/llm/latest/llm/
Apache License 2.0
6.06k stars 350 forks source link

Not enough space in the context's memory pool #455

Open thecodechemist99 opened 2 months ago

thecodechemist99 commented 2 months ago

Hi!

I’m trying to use this crate for a project of mine. But it crashes as soon as a model has been loaded.

Expected behaviour

After loading the model, I can use it.

Actual behaviour

The model loads, but crashes immediately with the following output:

Loaded tensor 8/201
Loaded tensor 16/201
Loaded tensor 24/201
Loaded tensor 32/201
Loaded tensor 40/201
Loaded tensor 48/201
Loaded tensor 56/201
Loaded tensor 64/201
Loaded tensor 72/201
Loaded tensor 80/201
Loaded tensor 88/201
Loaded tensor 96/201
Loaded tensor 104/201
Loaded tensor 112/201
Loaded tensor 120/201
Loaded tensor 128/201
Loaded tensor 136/201
Loaded tensor 144/201
Loaded tensor 152/201
Loaded tensor 160/201
Loaded tensor 168/201
Loaded tensor 176/201
Loaded tensor 184/201
Loaded tensor 192/201
Loaded tensor 200/201
Loading of model complete
Model size = 745.81 MB / num tensors = 201
ggml_new_object: not enough space in the context's memory pool (needed 184549744, available 2097152)
zsh: segmentation fault  cargo run

I tried a few different models, all showed the same behaviour.

Setup

OS: MacOS 14.1.4 (M1) Rust version: 1.76.0 Crate: I tried the main branch as well as the gguf branch with the same results.

Edit: I found the ModelParameters struct with the context_size field. Unfortunately, increasing this value doesn’t change a thing about the error, even the displayed available memory stays exactly the same. I also tried to set prefer_mmap to false, as this is suggested for resource constrained environments. This actually gets rid of the aforementioned error but instead throws a “non-specific I/O error”.

Edit 2: Actually, reducing the context_size decreases the “needed” memory. I got it to almost reach the “available” memory value by setting it very low, but then it went up again: