kherud / java-llama.cpp

Java Bindings for llama.cpp - A Port of Facebook's LLaMA model in C/C++
MIT License
300 stars 32 forks source link

Segfault on model open and close #80

Open shuttie opened 1 month ago

shuttie commented 1 month ago

On a latest 3.4.1 version I have a JVM crash when using this code:

val params = new ModelParameters().setModelFilePath("qwen2-0_5b-instruct-q4_0.gguf")
val model = new LlamaModel(params)
model.close() // <-- crashes here

For a code where I actually do generation (like in README), then the close() call causes no crash. It does not depend on model, but the qwen2 is small enough to illustrate the issue.

JVM Crash log:

Stack: [0x00007f3cb1c01000,0x00007f3cb2401000],  sp=0x00007f3cb23ffa08,  free space=8186k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libc.so.6+0x15ed00]

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x00007f3c1cc00000

Native stacktrace:

jhsdb jstack --core core --exe /usr/lib/jvm/openjdk-17/bin/java
Attaching to core core from executable /usr/lib/jvm/openjdk-17/bin/java, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 17.0.12+7
Deadlock Detection:

No deadlocks found.

"main" #1 prio=5 tid=0x00007f4da402e130 nid=0x17f2 runnable [0x00007f4daa5fe000]
   java.lang.Thread.State: RUNNABLE
   JavaThread state: _thread_in_native
 - de.kherud.llama.LlamaModel.delete() @bci=0 (Interpreted frame)
 - de.kherud.llama.LlamaModel.close() @bci=1, line=115 (Interpreted frame)
 - ai.nixiesearch.util.LlamaCrash$.main(java.lang.String[]) @bci=43, line=15 (Interpreted frame)
 - ai.nixiesearch.util.LlamaCrash.main(java.lang.String[]) @bci=4 (Interpreted frame)

clhsdb pstack:

----------------- 6130 -----------------
"main" #1 prio=5 tid=0x00007f4da402e130 nid=0x17f2 runnable [0x00007f4daa5fe000]
   java.lang.Thread.State: RUNNABLE
   JavaThread state: _thread_in_native
0x00007f4d5fa3264f      std::_Rb_tree<std::pair<std::string, std::string>, std::pair<std::pair<std::string, std::string> const, int>, std::_Select1st<std::pair<std::pair<std::string, std::string> const, int> >, std::less<std::pair<std::string, std::string> >, std::allocator<std::pair<std::pair<std::string, std::string> const, int> > >::_M_erase(std::_Rb_tree_node<std::pair<std::pair<std::string, std::string> const, int> >*) [clone .isra.0] + 0x2f
Locked ownable synchronizers:
    - None

I will later build a -DLLAMA_DEBUG version of the native library and check out the proper stacktrace. But for me sounds like something not fully being initialized on start, and got deleted on close.

hs_err_pid30598.log

shuttie commented 1 month ago

And here's the debug build:

Current thread (0x00007f7124000bd0):  JavaThread "Thread-0" [_thread_in_native, id=16695, stack(0x00007f7250401000,0x00007f7250c01000)]

Stack: [0x00007f7250401000,0x00007f7250c01000],  sp=0x00007f7250bfeaa0,  free space=8182k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libggml.so+0x57c4e]  ggml_backend_buffer_clear+0x15
C  [libllama.so+0x262087]  llama_kv_cache_clear(llama_kv_cache&)+0xe4
C  [libllama.so+0x2c0fe8]  llama_kv_cache_clear+0x1e
C  [libjllama.so+0x2590a0]  server_context::kv_cache_clear()+0x1c
C  [libjllama.so+0x26378e]  server_context::update_slots()+0x75e
C  [libjllama.so+0x2bdd02]  void std::__invoke_impl<void, void (server_context::*&)(), server_context*&>(std::__invoke_memfun_deref, void (server_context::*&)(), server_context*&)+0x67
C  [libjllama.so+0x2b7fa5]  std::__invoke_result<void (server_context::*&)(), server_context*&>::type std::__invoke<void (server_context::*&)(), server_context*&>(void (server_context::*&)(), server_context*&)+0x37
C  [libjllama.so+0x2afc42]  void std::_Bind<void (server_context::*(server_context*))()>::__call<void, , 0ul>(std::tuple<>&&, std::_Index_tuple<0ul>)+0x48
C  [libjllama.so+0x2a9df2]  void std::_Bind<void (server_context::*(server_context*))()>::operator()<, void>()+0x24
C  [libjllama.so+0x29e3a2]  void std::__invoke_impl<void, std::_Bind<void (server_context::*(server_context*))()>&>(std::__invoke_other, std::_Bind<void (server_context::*(server_context*))()>&)+0x20
C  [libjllama.so+0x294137]  std::enable_if<std::__and_<std::is_void<void>, std::__is_invocable<std::_Bind<void (server_context::*(server_context*))()>&> >::value, void>::type std::__invoke_r<void, std::_Bind<void (server_context::*(server_context*))()>&>(std::_Bind<void (server_context::*(server_context*))()>&)+0x20
C  [libjllama.so+0x283fc1]  std::_Function_handler<void (), std::_Bind<void (server_context::*(server_context*))()> >::_M_invoke(std::_Any_data const&)+0x20
C  [libjllama.so+0x26e7f4]  std::function<void ()>::operator()() const+0x32
C  [libjllama.so+0x25286e]  server_queue::start_loop()+0x23c
C  [libjllama.so+0x2471e2]  Java_de_kherud_llama_LlamaModel_loadModel::{lambda()#1}::operator()() const+0xa4
C  [libjllama.so+0x24bffa]  void std::__invoke_impl<void, Java_de_kherud_llama_LlamaModel_loadModel::{lambda()#1}>(std::__invoke_other, Java_de_kherud_llama_LlamaModel_loadModel::{lambda()#1}&&)+0x20
C  [libjllama.so+0x24bfaf]  std::__invoke_result<Java_de_kherud_llama_LlamaModel_loadModel::{lambda()#1}>::type std::__invoke<Java_de_kherud_llama_LlamaModel_loadModel::{lambda()#1}>(Java_de_kherud_llama_LlamaModel_loadModel::{lambda()#1}&&)+0x20
C  [libjllama.so+0x24bf5c]  void std::thread::_Invoker<std::tuple<Java_de_kherud_llama_LlamaModel_loadModel::{lambda()#1}> >::_M_invoke<0ul>(std::_Index_tuple<0ul>)+0x28
C  [libjllama.so+0x24bf30]  std::thread::_Invoker<std::tuple<Java_de_kherud_llama_LlamaModel_loadModel::{lambda()#1}> >::operator()()+0x18
C  [libjllama.so+0x24bf14]  std::thread::_State_impl<std::thread::_Invoker<std::tuple<Java_de_kherud_llama_LlamaModel_loadModel::{lambda()#1}> > >::_M_run()+0x1c

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x00007f7553bc5329

Registers:
RAX=0x00007f7553bc52f1, RBX=0x00007f72a49dc9c0, RCX=0x0000000000000235, RDX=0x00007f7250bfead8
RSP=0x00007f7250bfeaa0, RBP=0x00007f7250bfeab0, RSI=0x0000000000000000, RDI=0x00007f7553bc52f1
R8 =0x0000000000000002, R9 =0x0000000000000001, R10=0x000000000000000a, R11=0x00007f725e859084
R12=0xffffffffffffff88, R13=0x0000000000000002, R14=0x00007f72a9b4d980, R15=0x00007f72a9b4da87
RIP=0x00007f725f5ecc4e, EFLAGS=0x0000000000010206, CSGSFS=0x002b000000000033, ERR=0x0000000000000004
  TRAPNO=0x000000000000000e
shuttie commented 1 month ago

And the last bit:

val params = new ModelParameters().setModelFilePath("qwen2-0_5b-instruct-q4_0.gguf")
val model = new LlamaModel(params)
Thread.sleep(1000)
model.close() // <-- no crash!

So it seems like a race condition on start, when the model is not yet fully loaded, but we start unloading it.