import llm
model = llm.get_model("orca-mini-3b-gguf2-q4_0")
for i in range(70):
print(i, model.prompt("How are you today?"))
seems to always crash on the 61th prompt() call.
It doesn't seem to be related to running out of memory but something else.
I'm not quite sure if this is actually llm-gpt4all issue or issue in gpt4all or even in llama.cpp.
But at least, I didn't see issues when using gpt4-all directly (at least the following version works):
from gpt4all import GPT4All
model = GPT4All(MODEL)
for i in range(70)
print(model.generate("How are you", max_tokens=5))
Anyway, the gpt4-all Python API behaves quite a bit differently here.
E.g., the llm-gpt4all re-creates LLModel objects in Python for each prompt.
The coredump shows that the ctx variable seen by the C++ code is null but how that exactly happens:
Program terminated with signal SIGSEGV, Segmentation fault.
0 0x00007f5a024a8ac4 in ggml_new_object (ctx=ctx@entry=0x0, type=type@entry=GGML_OBJECT_GRAPH, size=262440)
at /home/johannes/ai/llm/gpt4all/gpt4all-backend/llama.cpp-mainline/ggml.c:2430
Simple script like:
seems to always crash on the 61th prompt() call. It doesn't seem to be related to running out of memory but something else.
I'm not quite sure if this is actually llm-gpt4all issue or issue in gpt4all or even in llama.cpp. But at least, I didn't see issues when using gpt4-all directly (at least the following version works):
Anyway, the gpt4-all Python API behaves quite a bit differently here. E.g., the llm-gpt4all re-creates LLModel objects in Python for each prompt.
The coredump shows that the ctx variable seen by the C++ code is null but how that exactly happens:
Program terminated with signal SIGSEGV, Segmentation fault.
0 0x00007f5a024a8ac4 in ggml_new_object (ctx=ctx@entry=0x0, type=type@entry=GGML_OBJECT_GRAPH, size=262440)
2430 struct ggml_object * obj_cur = ctx->objects_end; (gdb) bt
0 0x00007f5a024a8ac4 in ggml_new_object (ctx=ctx@entry=0x0, type=type@entry=GGML_OBJECT_GRAPH, size=262440)
1 0x00007f5a024cc3c2 in ggml_new_graph_custom (ctx=0x0, size=8192, grads=false)
2 0x00007f5a02488bae in llm_build_context::build_llama (this=this@entry=0x7ffc5fb27d60)
3 0x00007f5a024606c4 in llama_build_graph (lctx=..., batch=...) at /home/johannes/ai/llm/gpt4all/gpt4all-backend/llama.cpp-mainline/llama.cpp:6191
4 0x00007f5a0246e405 in llama_new_context_with_model (model=, params=...)
5 0x00007f5a02456a98 in LLamaModel::loadModel (this=0x419eed0, modelPath="/home/johannes/.cache/gpt4all/orca-mini-3b-gguf2-q4_0.gguf", n_ctx=)
6 0x00007f5a03f770df in llmodel_loadModel (model=, model_path=0x7f5a02acbdd0 "/home/johannes/.cache/gpt4all/orca-mini-3b-gguf2-q4_0.gguf",
7 0x00007f5a03f898b6 in ffi_call_unix64 () at ../src/x86/unix64.S:104
8 0x00007f5a03f8634d in ffi_call_int (cif=cif@entry=0x7ffc5fb297f0, fn=, rvalue=, avalue=,
9 0x00007f5a03f88f33 in ffi_call (cif=cif@entry=0x7ffc5fb297f0, fn=fn@entry=0x7f5a03f77060 <llmodel_loadModel(llmodel_model, char const*, int)>,
10 0x00007f5a042142e9 in _call_function_pointer (argtypecount=, argcount=3, resmem=0x7ffc5fb29700, restype=,
11 _ctypes_callproc (pProc=, argtuple=, flags=, argtypes=, restype=,
I'm using Ubuntu, Python 3.11.6, no GPU used here.