mdrokz / rust-llama.cpp

LLama.cpp rust bindings
https://crates.io/crates/llama_cpp_rs/
MIT License
290 stars 42 forks source link

Error in loading models #16

Closed RodrigoSdeCarvalho closed 8 months ago

RodrigoSdeCarvalho commented 8 months ago

Hello!

I'm trying to run the basic CPU example in the repo and I'm facing the following error when trying to load the "wizard-vicuna-13B.ggmlv3.q4_0.bin" model:

gguf_init_from_file: invalid magic number 67676a74
error loading model: llama_model_loader: failed to load model from /<hidden>/models/wizard-vicuna-13B.ggmlv3.q4_0.bin

llama_load_model_from_file: failed to load model

called `Result::unwrap()` on an `Err` value: "Failed to load model"
thread 'llama::tests::cuda_inference' panicked at 'called `Result::unwrap()` on an `Err` value: "Failed to load model"', app/llm/src/llama.rs:84:127
stack backtrace:

Then, I tried it with other .gguf models, and in all my attempts, the code would load the model but get stuck in the prediction until I get a free error ( which would take some minutes).

Does llama.cpp not support .bin files and are the llama models just so heavy that I can't run on my notebook (I have a Intel® Core™ i5-12500H and NVIDIA® GeForce® RTX™ 3050 Ti, GDDR6 de 4 GB)?

mdrokz commented 8 months ago

Hello!

I'm trying to run the basic CPU example in the repo and I'm facing the following error when trying to load the "wizard-vicuna-13B.ggmlv3.q4_0.bin" model:

gguf_init_from_file: invalid magic number 67676a74
error loading model: llama_model_loader: failed to load model from /<hidden>/models/wizard-vicuna-13B.ggmlv3.q4_0.bin

llama_load_model_from_file: failed to load model

called `Result::unwrap()` on an `Err` value: "Failed to load model"
thread 'llama::tests::cuda_inference' panicked at 'called `Result::unwrap()` on an `Err` value: "Failed to load model"', app/llm/src/llama.rs:84:127
stack backtrace:

Then, I tried it with other .gguf models, and in all my attempts, the code would load the model but get stuck in the prediction until I get a free error ( which would take some minutes).

Does llama.cpp not support .bin files and are the llama models just so heavy that I can't run on my notebook (I have a Intel® Core™ i5-12500H and NVIDIA® GeForce® RTX™ 3050 Ti, GDDR6 de 4 GB)?

Hey!

Since the new update LLama.cpp only supports GGUF models so older ggml models wont work, can you tell me the GGUF model that you tried i will do testing on my end.

Sorry for the late reply

RodrigoSdeCarvalho commented 8 months ago

I tried this model: https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q5_K_S.gguf

On CPU it runs for a long time before a free error.

mdrokz commented 8 months ago

I tried this model: https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q5_K_S.gguf

On CPU it runs for a long time before a free error.

Let me try it on my end and see whats happening. Thanks

mdrokz commented 8 months ago

I tried this model: https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q5_K_S.gguf

On CPU it runs for a long time before a free error.

Hey i tried it on my end and it works for me im using a Ryzen 7 3800X and 32gb ram whats your cpu & ram config ?

RodrigoSdeCarvalho commented 8 months ago

I have 16 Gb of RAM. NVIDIA® GeForce RTX™ 3050, 6 GB GDDR6 and 13th Gen Intel® Core™ i5-13450HX. Perhaps its a issue on my end then. Thanks, anyway!