Wrong struct size on llm-chain-llama-sys when compiling with cuda support, causes segfault on FFI call

andychenbruce commented 1 year ago

If I have llm-chain-llama-sys ="0.12.3" to Cargo.toml it runs fine, but if I have llm-chain-llama-sys = { version = "0.12.3", features = ["cuda"] } causes the program to segfault.

I tracked down where it was by adding .arg("-DCMAKE_BUILD_TYPE=Debug") to llm-chain-llama-sys/build.rs:84 to tell Cmake to add debug symbols to the llama.cpp. Then stepping through the program in gdb I found it causes the segfault (valgrind says it tries to Jump to the invalid address stated on the next line 0x0: ???) at what I believe to be the first FFI the program calls.

My rust code calls llm_chain_llama::Executor::new_with_options(options), which eventually goes to the line llm-chain-llama/src/context.rs:42 which is an unsafe block to the FFI function llama_context_default_params which starts on llama.cpp/llama.cpp:864.

When gdb enters llama_context_default_params, running bt shows a correct back trace leading back to the rust program. After stepping over the struct initialization, now bt shows that the rust program tries to return to 0x00000000. I assume its because the stack frame is getting messed up. The c++ function llama_context_default_params just returns a struct so the struct size is probably wrong .

I think I found the problem, before the struct initialization on llama.cpp:864 if I add a line

printf("HELLO FROM CPP ASDFASDF struct size is %ld\n", sizeof(llama_context_params));

and then in rust add the line

eprintln!("HELLO FROM RUST struct size = {}", std::mem::size_of::<llm_chain__llama_sys::llama_context_params>()));
let executor = llm_chain::llama::Executor::new_with_options(options)?;
eprintln!("GOT PAST FUNCTION");

If I dont enable features = ["cuda"] they print the same size of 48.

HELLO FROM RUST struct size = 48
HELLO FROM CPP ASDFASDF struct size is 48
GOT PAST FUNCTION
... program runs fine

but if do have features = ["cuda"]

HELLO FROM RUST struct size = 48
HELLO FROM CPP ASDFASDF struct size is 112
Segmentation fault (core dumped)

which I think is the problem.

The struct llama_context_params is defined in llama.h:74, and I think the problem is that it has an member float tensor_split[LLAMA_MAX_DEVICES].

When cuda is enabled the build.rs:88 passes in the build flag -DLLAMA_CUBLAS=ON which on llama.h:5 some preprocessor ifdefs changes the value of LLAMA_MAX_DEVICES which messes with the size of the struct which messes up the cpp to rust bindings where the stack gets messed up and causes a segfault. I think it has something to do with bindgen handling preprocessor stuff.

andychenbruce commented 1 year ago

I have no idea how bindgen works but on llm-chain-llama-sys/src/bindings.rs:763 it has

const LLAMA_MAX_DEVICES: usize = 1;

and then 10 lines after on line 773 it has

pub struct llama_context_params {
...
    pub tensor_split: [::std::os::row::c_floag; LLAM_MAX_DEVICES],
...
}

and yes the spelling mistake is intentional, LLAM_MAX_DEVICES is missing an A. You can replace LLAM_MAX_DEVICES with asdfasdf and it will still compile so I don't understand where it is getting the symbol from, but regardless I think bindings.rs:773 should have LLAMA_MAX_DEVICES not LLAM_MAX_DEVICES and I don't see how that isn't a compile error right now.

EDIT: NVM it looks like I can delete bindings.rs and it still works? I think bindgen regenerates it or something. It still sets LLAMA_MAX_DEVICES to 1 instead of correctly to 16.

EDIT2: NVM again, it looks like build.rs only uses the bindings.rs if bindgen fails to generate the bindings.

EDIT3: llama.cpps makefile sends -DGGML_USE_CUBLAS as a flag, if you put .clang_arg("-DGGML_USE_CUBLAS") into build.rs:36 it actuall sets the correct array size but it causes some compile errors in llm-chain-llama cause now the array is mismatched lengths, 16 instead of 1.

andychenbruce commented 1 year ago

Ok I think I fixed it.

If the cuda feature is enabled then the build.rs needs to pass .clang_arg("-DGGML_USE_CUBLAS") into bindgens libclang thingy That will hopefully generate the correct bindings.

Then in llm-chain-llama/src/context.rs:23, the hardcoded

const LLAMA_MAX_DEVICES: usize = 1;

Should be changed to use the bindings value. One way is to do

const LLAMA_MAX_DEVICES: usize = llm_chain_llama_sys::LLAMA_MAX_DEVICES as usize;

Then it actually compiles. Currently I don't see any way to actually tell it to use the GPU with the options since it doesn't have a Opt to set num_gpu_layers but at least it doesn't segfault anymore.

I'll try to submit a pull request rn.

Edit: https://github.com/sobelio/llm-chain/pull/175

sobelio / llm-chain

Wrong struct size on llm-chain-llama-sys when compiling with cuda support, causes segfault on FFI call #174