Closed andychenbruce closed 1 year ago
I have no idea how bindgen works but on llm-chain-llama-sys/src/bindings.rs:763
it has
const LLAMA_MAX_DEVICES: usize = 1;
and then 10 lines after on line 773
it has
pub struct llama_context_params {
...
pub tensor_split: [::std::os::row::c_floag; LLAM_MAX_DEVICES],
...
}
and yes the spelling mistake is intentional, LLAM_MAX_DEVICES
is missing an A
. You can replace LLAM_MAX_DEVICES
with asdfasdf
and it will still compile so I don't understand where it is getting the symbol from, but regardless I think bindings.rs:773
should have LLAMA_MAX_DEVICES
not LLAM_MAX_DEVICES
and I don't see how that isn't a compile error right now.
EDIT: NVM it looks like I can delete bindings.rs
and it still works? I think bindgen regenerates it or something. It still sets LLAMA_MAX_DEVICES
to 1
instead of correctly to 16
.
EDIT2: NVM again, it looks like build.rs only uses the bindings.rs
if bindgen fails to generate the bindings.
EDIT3: llama.cpp
s makefile sends -DGGML_USE_CUBLAS
as a flag, if you put .clang_arg("-DGGML_USE_CUBLAS")
into build.rs:36
it actuall sets the correct array size but it causes some compile errors in llm-chain-llama
cause now the array is mismatched lengths, 16 instead of 1.
Ok I think I fixed it.
If the cuda feature is enabled then the build.rs
needs to pass .clang_arg("-DGGML_USE_CUBLAS")
into bindgens libclang thingy That will hopefully generate the correct bindings.
Then in llm-chain-llama/src/context.rs:23
, the hardcoded
const LLAMA_MAX_DEVICES: usize = 1;
Should be changed to use the bindings value. One way is to do
const LLAMA_MAX_DEVICES: usize = llm_chain_llama_sys::LLAMA_MAX_DEVICES as usize;
Then it actually compiles. Currently I don't see any way to actually tell it to use the GPU with the options since it doesn't have a Opt
to set num_gpu_layers
but at least it doesn't segfault anymore.
I'll try to submit a pull request rn.
If I have
llm-chain-llama-sys ="0.12.3"
toCargo.toml
it runs fine, but if I havellm-chain-llama-sys = { version = "0.12.3", features = ["cuda"] }
causes the program to segfault.I tracked down where it was by adding
.arg("-DCMAKE_BUILD_TYPE=Debug")
tollm-chain-llama-sys/build.rs:84
to tell Cmake to add debug symbols to thellama.cpp
. Then stepping through the program ingdb
I found it causes the segfault (valgrind says it tries toJump to the invalid address stated on the next line 0x0: ???
) at what I believe to be the first FFI the program calls.My rust code calls
llm_chain_llama::Executor::new_with_options(options)
, which eventually goes to the linellm-chain-llama/src/context.rs:42
which is anunsafe
block to the FFI functionllama_context_default_params
which starts onllama.cpp/llama.cpp:864
.When
gdb
entersllama_context_default_params
, runningbt
shows a correct back trace leading back to the rust program. After stepping over the struct initialization, nowbt
shows that the rust program tries to return to0x00000000
. I assume its because the stack frame is getting messed up. The c++ functionllama_context_default_params
just returns a struct so the struct size is probably wrong .I think I found the problem, before the struct initialization on
llama.cpp:864
if I add a lineand then in rust add the line
If I dont enable
features = ["cuda"]
they print the same size of48
.but if do have
features = ["cuda"]
which I think is the problem.
The struct
llama_context_params
is defined inllama.h:74
, and I think the problem is that it has an memberfloat tensor_split[LLAMA_MAX_DEVICES]
.When cuda is enabled the
build.rs:88
passes in the build flag-DLLAMA_CUBLAS=ON
which onllama.h:5
some preprocessorifdef
s changes the value ofLLAMA_MAX_DEVICES
which messes with the size of the struct which messes up the cpp to rust bindings where the stack gets messed up and causes a segfault. I think it has something to do with bindgen handling preprocessor stuff.