Closed JewishLewish closed 8 months ago
I been playing around with the Python and Rust bindings of llama and noticed that Python was producing content much faster despite same model / input.
When I printed out the args/specs of the run I noticed some things were missing from the Rust binding that Python was using.
llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: rope scaling = linear llm_load_print_meta: n_yarn_orig_ctx = 2048
I am not sure if either Python is using better specifications or I am using inappropriate poor specs because I been playing around with threads, n_batch, batch and n_gpu layers. I tried to find comments via the Rust code but couldn't find anything.
Ex. (Python's Binding)
Any recommendations?
Closing it.
Turns out that the documentation/var names on the Rust binding is lacking / off.
I been playing around with the Python and Rust bindings of llama and noticed that Python was producing content much faster despite same model / input.
When I printed out the args/specs of the run I noticed some things were missing from the Rust binding that Python was using.
I am not sure if either Python is using better specifications or I am using inappropriate poor specs because I been playing around with threads, n_batch, batch and n_gpu layers. I tried to find comments via the Rust code but couldn't find anything.
Ex. (Python's Binding)![image](https://github.com/mdrokz/rust-llama.cpp/assets/65754609/79e9e6d3-8551-4a9e-be6b-97ed3419ace0)
Any recommendations?