hegelai / prompttools

Open-source tools for prompt testing and experimentation, with support for both LLMs (e.g. OpenAI, LLaMA) and vector databases (e.g. Chroma, Weaviate, LanceDB).
http://prompttools.readthedocs.io
Apache License 2.0
2.55k stars 216 forks source link

It can't load with llama_load_model_from_file: failed to load model #107

Open bienpr opened 7 months ago

bienpr commented 7 months ago

⁉️ Discussion/Question

Hi.

I just start with prompttools. I have two problems. I will post separate them.

First, I couldn't load from downloaded model. There is error like below code when I tested LlamaCppExperiment.ipynb. I try to different models and occurred same errors.

Environments

Logs

gguf_init_from_file: invalid magic characters tjgg(�k.
error loading model: llama_model_loader: failed to load model from /Users/sewonist/Downloads/llama-2-7b-chat.ggmlv3.q2_K.bin

llama_load_model_from_file: failed to load model
AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 |
ssertionError                            Traceback (most recent call last)
Cell In[3], line 1
----> 1 experiment.run()

File ~/Projects/13.AIChat/05.Projects/prompttools/prompttools/experiment/experiments/llama_cpp_experiment.py:177, in LlamaCppExperiment.run(self, runs)
    175 latencies = []
    176 for model_combo in self.model_argument_combos:
--> 177     client = Llama(**model_combo)
    178     for call_combo in self.call_argument_combos:
    179         for _ in range(runs):

File ~/anaconda3/envs/prompttools/lib/python3.11/site-packages/llama_cpp/llama.py:923, in Llama.__init__(self, model_path, n_gpu_layers, main_gpu, tensor_split, vocab_only, use_mmap, use_mlock, seed, n_ctx, n_batch, n_threads, n_threads_batch, rope_scaling_type, rope_freq_base, rope_freq_scale, yarn_ext_factor, yarn_attn_factor, yarn_beta_fast, yarn_beta_slow, yarn_orig_ctx, mul_mat_q, f16_kv, logits_all, embedding, last_n_tokens_size, lora_base, lora_scale, lora_path, numa, chat_format, chat_handler, verbose, **kwargs)
    920 self.chat_format = chat_format
    921 self.chat_handler = chat_handler
--> 923 self._n_vocab = self.n_vocab()
    924 self._n_ctx = self.n_ctx()
    926 self._token_nl = self.token_nl()

File ~/anaconda3/envs/prompttools/lib/python3.11/site-packages/llama_cpp/llama.py:2184, in Llama.n_vocab(self)
   2182 def n_vocab(self) -> int:
   2183     """Return the vocabulary size."""
-> 2184     return self._model.n_vocab()

File ~/anaconda3/envs/prompttools/lib/python3.11/site-packages/llama_cpp/llama.py:250, in _LlamaModel.n_vocab(self)
    249 def n_vocab(self) -> int:
--> 250     assert self.model is not None
    251     return llama_cpp.llama_n_vocab(self.model)

AssertionError:

I'm wonder LammaCpp doesn't support M series? Please let me know any ideas.

Thanks.

steventkrawczyk commented 7 months ago

Have you downloaded the model llama-2-7b-chat.ggmlv3.q2_K.bin and followed the setup instructions at https://github.com/ggerganov/llama.cpp and https://github.com/abetlen/llama-cpp-python?