mdrokz / rust-llama.cpp

LLama.cpp rust bindings
https://crates.io/crates/llama_cpp_rs/
MIT License
290 stars 42 forks source link

Slow Performance compared to Python Binding #19

Closed JewishLewish closed 8 months ago

JewishLewish commented 8 months ago

I been playing around with the Python and Rust bindings of llama and noticed that Python was producing content much faster despite same model / input.

When I printed out the args/specs of the run I noticed some things were missing from the Rust binding that Python was using.

llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: n_yarn_orig_ctx  = 2048

I am not sure if either Python is using better specifications or I am using inappropriate poor specs because I been playing around with threads, n_batch, batch and n_gpu layers. I tried to find comments via the Rust code but couldn't find anything.

Ex. (Python's Binding) image

Any recommendations?

JewishLewish commented 8 months ago

Closing it.

Turns out that the documentation/var names on the Rust binding is lacking / off.