Open shuttie opened 1 month ago
And here's the debug build:
Current thread (0x00007f7124000bd0): JavaThread "Thread-0" [_thread_in_native, id=16695, stack(0x00007f7250401000,0x00007f7250c01000)]
Stack: [0x00007f7250401000,0x00007f7250c01000], sp=0x00007f7250bfeaa0, free space=8182k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [libggml.so+0x57c4e] ggml_backend_buffer_clear+0x15
C [libllama.so+0x262087] llama_kv_cache_clear(llama_kv_cache&)+0xe4
C [libllama.so+0x2c0fe8] llama_kv_cache_clear+0x1e
C [libjllama.so+0x2590a0] server_context::kv_cache_clear()+0x1c
C [libjllama.so+0x26378e] server_context::update_slots()+0x75e
C [libjllama.so+0x2bdd02] void std::__invoke_impl<void, void (server_context::*&)(), server_context*&>(std::__invoke_memfun_deref, void (server_context::*&)(), server_context*&)+0x67
C [libjllama.so+0x2b7fa5] std::__invoke_result<void (server_context::*&)(), server_context*&>::type std::__invoke<void (server_context::*&)(), server_context*&>(void (server_context::*&)(), server_context*&)+0x37
C [libjllama.so+0x2afc42] void std::_Bind<void (server_context::*(server_context*))()>::__call<void, , 0ul>(std::tuple<>&&, std::_Index_tuple<0ul>)+0x48
C [libjllama.so+0x2a9df2] void std::_Bind<void (server_context::*(server_context*))()>::operator()<, void>()+0x24
C [libjllama.so+0x29e3a2] void std::__invoke_impl<void, std::_Bind<void (server_context::*(server_context*))()>&>(std::__invoke_other, std::_Bind<void (server_context::*(server_context*))()>&)+0x20
C [libjllama.so+0x294137] std::enable_if<std::__and_<std::is_void<void>, std::__is_invocable<std::_Bind<void (server_context::*(server_context*))()>&> >::value, void>::type std::__invoke_r<void, std::_Bind<void (server_context::*(server_context*))()>&>(std::_Bind<void (server_context::*(server_context*))()>&)+0x20
C [libjllama.so+0x283fc1] std::_Function_handler<void (), std::_Bind<void (server_context::*(server_context*))()> >::_M_invoke(std::_Any_data const&)+0x20
C [libjllama.so+0x26e7f4] std::function<void ()>::operator()() const+0x32
C [libjllama.so+0x25286e] server_queue::start_loop()+0x23c
C [libjllama.so+0x2471e2] Java_de_kherud_llama_LlamaModel_loadModel::{lambda()#1}::operator()() const+0xa4
C [libjllama.so+0x24bffa] void std::__invoke_impl<void, Java_de_kherud_llama_LlamaModel_loadModel::{lambda()#1}>(std::__invoke_other, Java_de_kherud_llama_LlamaModel_loadModel::{lambda()#1}&&)+0x20
C [libjllama.so+0x24bfaf] std::__invoke_result<Java_de_kherud_llama_LlamaModel_loadModel::{lambda()#1}>::type std::__invoke<Java_de_kherud_llama_LlamaModel_loadModel::{lambda()#1}>(Java_de_kherud_llama_LlamaModel_loadModel::{lambda()#1}&&)+0x20
C [libjllama.so+0x24bf5c] void std::thread::_Invoker<std::tuple<Java_de_kherud_llama_LlamaModel_loadModel::{lambda()#1}> >::_M_invoke<0ul>(std::_Index_tuple<0ul>)+0x28
C [libjllama.so+0x24bf30] std::thread::_Invoker<std::tuple<Java_de_kherud_llama_LlamaModel_loadModel::{lambda()#1}> >::operator()()+0x18
C [libjllama.so+0x24bf14] std::thread::_State_impl<std::thread::_Invoker<std::tuple<Java_de_kherud_llama_LlamaModel_loadModel::{lambda()#1}> > >::_M_run()+0x1c
siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x00007f7553bc5329
Registers:
RAX=0x00007f7553bc52f1, RBX=0x00007f72a49dc9c0, RCX=0x0000000000000235, RDX=0x00007f7250bfead8
RSP=0x00007f7250bfeaa0, RBP=0x00007f7250bfeab0, RSI=0x0000000000000000, RDI=0x00007f7553bc52f1
R8 =0x0000000000000002, R9 =0x0000000000000001, R10=0x000000000000000a, R11=0x00007f725e859084
R12=0xffffffffffffff88, R13=0x0000000000000002, R14=0x00007f72a9b4d980, R15=0x00007f72a9b4da87
RIP=0x00007f725f5ecc4e, EFLAGS=0x0000000000010206, CSGSFS=0x002b000000000033, ERR=0x0000000000000004
TRAPNO=0x000000000000000e
And the last bit:
val params = new ModelParameters().setModelFilePath("qwen2-0_5b-instruct-q4_0.gguf")
val model = new LlamaModel(params)
Thread.sleep(1000)
model.close() // <-- no crash!
So it seems like a race condition on start, when the model is not yet fully loaded, but we start unloading it.
On a latest 3.4.1 version I have a JVM crash when using this code:
For a code where I actually do generation (like in README), then the
close()
call causes no crash. It does not depend on model, but the qwen2 is small enough to illustrate the issue.JVM Crash log:
Native stacktrace:
clhsdb pstack:
I will later build a
-DLLAMA_DEBUG
version of the native library and check out the proper stacktrace. But for me sounds like something not fully being initialized on start, and got deleted onclose
.hs_err_pid30598.log