Closed RodrigoSdeCarvalho closed 8 months ago
Hello!
I'm trying to run the basic CPU example in the repo and I'm facing the following error when trying to load the "wizard-vicuna-13B.ggmlv3.q4_0.bin" model:
gguf_init_from_file: invalid magic number 67676a74 error loading model: llama_model_loader: failed to load model from /<hidden>/models/wizard-vicuna-13B.ggmlv3.q4_0.bin llama_load_model_from_file: failed to load model called `Result::unwrap()` on an `Err` value: "Failed to load model" thread 'llama::tests::cuda_inference' panicked at 'called `Result::unwrap()` on an `Err` value: "Failed to load model"', app/llm/src/llama.rs:84:127 stack backtrace:
Then, I tried it with other .gguf models, and in all my attempts, the code would load the model but get stuck in the prediction until I get a free error ( which would take some minutes).
Does llama.cpp not support .bin files and are the llama models just so heavy that I can't run on my notebook (I have a Intel® Core™ i5-12500H and NVIDIA® GeForce® RTX™ 3050 Ti, GDDR6 de 4 GB)?
Hey!
Since the new update LLama.cpp only supports GGUF models so older ggml models wont work, can you tell me the GGUF model that you tried i will do testing on my end.
Sorry for the late reply
I tried this model: https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q5_K_S.gguf
On CPU it runs for a long time before a free error.
I tried this model: https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q5_K_S.gguf
On CPU it runs for a long time before a free error.
Let me try it on my end and see whats happening. Thanks
I tried this model: https://huggingface.co/TheBloke/Llama-2-7B-GGUF/blob/main/llama-2-7b.Q5_K_S.gguf
On CPU it runs for a long time before a free error.
Hey i tried it on my end and it works for me im using a Ryzen 7 3800X and 32gb ram whats your cpu & ram config ?
I have 16 Gb of RAM. NVIDIA® GeForce RTX™ 3050, 6 GB GDDR6 and 13th Gen Intel® Core™ i5-13450HX. Perhaps its a issue on my end then. Thanks, anyway!
Hello!
I'm trying to run the basic CPU example in the repo and I'm facing the following error when trying to load the "wizard-vicuna-13B.ggmlv3.q4_0.bin" model:
Then, I tried it with other .gguf models, and in all my attempts, the code would load the model but get stuck in the prediction until I get a free error ( which would take some minutes).
Does llama.cpp not support .bin files and are the llama models just so heavy that I can't run on my notebook (I have a Intel® Core™ i5-12500H and NVIDIA® GeForce® RTX™ 3050 Ti, GDDR6 de 4 GB)?