simonw / llm-gpt4all

Plugin for LLM adding support for the GPT4All collection of models
Apache License 2.0
194 stars 19 forks source link

Repeated prompt segfaults on 61th iteration #22

Open jonppe opened 5 months ago

jonppe commented 5 months ago

Simple script like:

import llm
model = llm.get_model("orca-mini-3b-gguf2-q4_0")
for i in range(70):
    print(i, model.prompt("How are you today?"))

seems to always crash on the 61th prompt() call. It doesn't seem to be related to running out of memory but something else.

I'm not quite sure if this is actually llm-gpt4all issue or issue in gpt4all or even in llama.cpp. But at least, I didn't see issues when using gpt4-all directly (at least the following version works):

from gpt4all import GPT4All
model = GPT4All(MODEL)
for i in range(70)
    print(model.generate("How are you", max_tokens=5))

Anyway, the gpt4-all Python API behaves quite a bit differently here. E.g., the llm-gpt4all re-creates LLModel objects in Python for each prompt.

The coredump shows that the ctx variable seen by the C++ code is null but how that exactly happens:

Program terminated with signal SIGSEGV, Segmentation fault.

0 0x00007f5a024a8ac4 in ggml_new_object (ctx=ctx@entry=0x0, type=type@entry=GGML_OBJECT_GRAPH, size=262440)

at /home/johannes/ai/llm/gpt4all/gpt4all-backend/llama.cpp-mainline/ggml.c:2430

2430 struct ggml_object * obj_cur = ctx->objects_end; (gdb) bt

0 0x00007f5a024a8ac4 in ggml_new_object (ctx=ctx@entry=0x0, type=type@entry=GGML_OBJECT_GRAPH, size=262440)

at /home/johannes/ai/llm/gpt4all/gpt4all-backend/llama.cpp-mainline/ggml.c:2430

1 0x00007f5a024cc3c2 in ggml_new_graph_custom (ctx=0x0, size=8192, grads=false)

at /home/johannes/ai/llm/gpt4all/gpt4all-backend/llama.cpp-mainline/ggml.c:15834

2 0x00007f5a02488bae in llm_build_context::build_llama (this=this@entry=0x7ffc5fb27d60)

at /home/johannes/ai/llm/gpt4all/gpt4all-backend/llama.cpp-mainline/llama.cpp:4326

3 0x00007f5a024606c4 in llama_build_graph (lctx=..., batch=...) at /home/johannes/ai/llm/gpt4all/gpt4all-backend/llama.cpp-mainline/llama.cpp:6191

4 0x00007f5a0246e405 in llama_new_context_with_model (model=, params=...)

at /home/johannes/ai/llm/gpt4all/gpt4all-backend/llama.cpp-mainline/llama.cpp:9514

5 0x00007f5a02456a98 in LLamaModel::loadModel (this=0x419eed0, modelPath="/home/johannes/.cache/gpt4all/orca-mini-3b-gguf2-q4_0.gguf", n_ctx=)

at /home/johannes/ai/llm/gpt4all/gpt4all-backend/llamamodel.cpp:215

6 0x00007f5a03f770df in llmodel_loadModel (model=, model_path=0x7f5a02acbdd0 "/home/johannes/.cache/gpt4all/orca-mini-3b-gguf2-q4_0.gguf",

n_ctx=2048) at /usr/include/c++/13/bits/basic_string.tcc:238

7 0x00007f5a03f898b6 in ffi_call_unix64 () at ../src/x86/unix64.S:104

8 0x00007f5a03f8634d in ffi_call_int (cif=cif@entry=0x7ffc5fb297f0, fn=, rvalue=, avalue=,

closure=closure@entry=0x0) at ../src/x86/ffi64.c:673

9 0x00007f5a03f88f33 in ffi_call (cif=cif@entry=0x7ffc5fb297f0, fn=fn@entry=0x7f5a03f77060 <llmodel_loadModel(llmodel_model, char const*, int)>,

rvalue=rvalue@entry=0x7ffc5fb29700, avalue=<optimized out>) at ../src/x86/ffi64.c:710

10 0x00007f5a042142e9 in _call_function_pointer (argtypecount=, argcount=3, resmem=0x7ffc5fb29700, restype=,

atypes=<optimized out>, avalues=<optimized out>, pProc=0x7f5a03f77060 <llmodel_loadModel(llmodel_model, char const*, int)>, flags=<optimized out>)
at /usr/src/python3.11-3.11.6-3/Modules/_ctypes/callproc.c:923

11 _ctypes_callproc (pProc=, argtuple=, flags=, argtypes=, restype=,

checker=<optimized out>) at /usr/src/python3.11-3.11.6-3/Modules/_ctypes/callproc.c:1262

I'm using Ubuntu, Python 3.11.6, no GPU used here.

Rubiel1 commented 4 months ago

Hi, I use Fedora 38, python 3.11.8, no GPU and have the same problem.