Open horenbergerb opened 1 year ago
Oh, strange... Updating the context like this:
params = llamacpp.InferenceParams.default_with_callback(progress_callback)
params.path_model = '/home/captdishwasher/horenbergerb/llama/llama.cpp/models/30Bnew/ggml-model-q4_0-ggjt.bin'
params.n_ctx = 2048
model = llamacpp.LlamaInference(params)
Did not change the context in the output logs:
(textgen) captdishwasher@captainofthedishwasher-MS-7D43:~/horenbergerb/llamacpp-python$ python crash_example.py
llama_model_load: loading model from '/home/captdishwasher/horenbergerb/llama/llama.cpp/models/30Bnew/ggml-model-q4_0-ggjt.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 6656
...
So maybe raising n_ctx would fix the problem if it could propagate properly EDIT: raising n_ctx would push the problem further out, but the threat of a segfault remains when your prompt gets bigger than n_ctx
Here's some relevant code in llama.cpp. This seems to be where the trick is revealed for how to get "infinite generation via context swapping"
@horenbergerb I didn't add anything for the "infinite generation" behavior in the LlamaInference
wrapper. It is possible that there is something in the underlying code that assumes that you won't exceed the context size.
@horenbergerb Have you tried this recently? Right now the bindings still fail if you exceed the context size. However, you can now set the context size using params
.
Running on Ubuntu, 32GB RAM. I get a segmentation fault by running the following code:
Output:
Possibly related to context or something? The number 512 matches the default n_ctx, but raising n_ctx didn't fix the problem... This has been coming up for users of text-generation-web-ui, which uses this package: https://github.com/oobabooga/text-generation-webui/issues/690