Closed leowenlu closed 2 months ago
Hi @leowenlu!
Unfortunately, I have never gotten this error so far, but we can try two things:
lama_cpp
version to 0.2.76
here and replace CMAKE_ARGS="-DGGML_CUDA=on"
with CMAKE_ARGS="-DLLAMA_CUBLAS=on"
here and also add -v
in pip3 install llama-cpp-python==$(llama_cpp_version)
to enable the verbose output?Also, some system packages may be required for the build process. On Ubuntu, for example, you might need to install the following:
sudo apt-get update
sudo apt-get install build-essential cmake libopenblas-dev
Hi @umbertogriffo
Follow your instruction, pined lama_cpp
version to 0.2.76
, but still remain CMAKE_ARGS="-DGGML_CUDA=on".
llama_cpp_python==0.2.76 pyllamacpp==1.0.7
I have passed make setup_cuda
and make update
But with streamlit run chatbot/chatbot_app.py -- --model llama-3 --max-new-tokens 1024
, I am getting the following error:
llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 291,
details:
llama_model_loader: loaded meta data with 33 key-value pairs and 292 tensors from /data/leoprojects/github/rag-chatbot/models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Meta Llama 3.1 8B Instruct
llama_model_loader: - kv 3: general.finetune str = Instruct
llama_model_loader: - kv 4: general.basename str = Meta-Llama-3.1
llama_model_loader: - kv 5: general.size_label str = 8B
llama_model_loader: - kv 6: general.license str = llama3.1
llama_model_loader: - kv 7: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv 8: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ...
llama_model_loader: - kv 9: llama.block_count u32 = 32
llama_model_loader: - kv 10: llama.context_length u32 = 131072
llama_model_loader: - kv 11: llama.embedding_length u32 = 4096
llama_model_loader: - kv 12: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 13: llama.attention.head_count u32 = 32
llama_model_loader: - kv 14: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 15: llama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 16: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 17: general.file_type u32 = 15
llama_model_loader: - kv 18: llama.vocab_size u32 = 128256
llama_model_loader: - kv 19: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 20: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 21: tokenizer.ggml.pre str = llama-bpe
llama_model_loader: - kv 22: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 24: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 25: tokenizer.ggml.bos_token_id u32 = 128000
llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 128009
llama_model_loader: - kv 27: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv 28: general.quantization_version u32 = 2
llama_model_loader: - kv 29: quantize.imatrix.file str = /models_out/Meta-Llama-3.1-8B-Instruc...
llama_model_loader: - kv 30: quantize.imatrix.dataset str = /training_dir/calibration_datav3.txt
llama_model_loader: - kv 31: quantize.imatrix.entries_count i32 = 224
llama_model_loader: - kv 32: quantize.imatrix.chunks_count i32 = 125
llama_model_loader: - type f32: 66 tensors
llama_model_loader: - type q4_K: 193 tensors
llama_model_loader: - type q6_K: 33 tensors
llm_load_vocab: special tokens definition check successful ( 256/128256 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 128256
llm_load_print_meta: n_merges = 280147
llm_load_print_meta: n_ctx_train = 131072
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 131072
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 8B
llm_load_print_meta: model ftype = Q4_K - Medium
llm_load_print_meta: model params = 8.03 B
llm_load_print_meta: model size = 4.58 GiB (4.89 BPW)
llm_load_print_meta: general.name = Meta Llama 3.1 8B Instruct
llm_load_print_meta: BOS token = 128000 ''
llm_load_print_meta: EOS token = 128009 ''
llm_load_print_meta: LF token = 128 'Ä'
llm_load_print_meta: EOT token = 128009 ''
llm_load_tensors: ggml ctx size = 0.15 MiB
llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 291
llama_load_model_from_file: failed to load model
[134786771306176] 2024-08-02 10:05:33,861 - __main__ - ERROR - An error occurred: Failed to load model from file: /data/leoprojects/github/rag-chatbot/models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
Traceback (most recent call last):
File "/data/leoprojects/github/rag-chatbot/.venv/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 264, in _get_or_create_cached_value
cached_result = cache.read_result(value_key)
File "/data/leoprojects/github/rag-chatbot/.venv/lib/python3.10/site-packages/streamlit/runtime/caching/cache_resource_api.py", line 500, in read_result
raise CacheKeyNotFoundError()
streamlit.runtime.caching.cache_errors.CacheKeyNotFoundError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/leoprojects/github/rag-chatbot/.venv/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 312, in _handle_cache_miss
cached_result = cache.read_result(value_key)
File "/data/leoprojects/github/rag-chatbot/.venv/lib/python3.10/site-packages/streamlit/runtime/caching/cache_resource_api.py", line 500, in read_result
raise CacheKeyNotFoundError()
streamlit.runtime.caching.cache_errors.CacheKeyNotFoundError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/leoprojects/github/rag-chatbot/chatbot/chatbot_app.py", line 165, in <module>
main(args)
File "/data/leoprojects/github/rag-chatbot/chatbot/chatbot_app.py", line 86, in main
llm = load_llm(client, model, model_folder)
File "/data/leoprojects/github/rag-chatbot/.venv/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 212, in wrapper
return cached_func(*args, **kwargs)
File "/data/leoprojects/github/rag-chatbot/.venv/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 241, in __call__
return self._get_or_create_cached_value(args, kwargs)
File "/data/leoprojects/github/rag-chatbot/.venv/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 267, in _get_or_create_cached_value
return self._handle_cache_miss(cache, value_key, func_args, func_kwargs)
File "/data/leoprojects/github/rag-chatbot/.venv/lib/python3.10/site-packages/streamlit/runtime/caching/cache_utils.py", line 321, in _handle_cache_miss
computed_value = self._info.func(*func_args, **func_kwargs)
File "/data/leoprojects/github/rag-chatbot/chatbot/chatbot_app.py", line 25, in load_llm
llm = get_client(llm_client, model_folder=model_folder, model_settings=model_settings)
File "/data/leoprojects/github/rag-chatbot/chatbot/bot/client/client_settings.py", line 40, in get_client
return client(**kwargs)
File "/data/leoprojects/github/rag-chatbot/chatbot/bot/client/lama_cpp_client.py", line 16, in __init__
super().__init__(model_folder, model_settings)
File "/data/leoprojects/github/rag-chatbot/chatbot/bot/client/llm_client.py", line 50, in __init__
self.llm = self._load_llm()
File "/data/leoprojects/github/rag-chatbot/chatbot/bot/client/lama_cpp_client.py", line 19, in _load_llm
llm = Llama(model_path=str(self.model_path), **self.model_settings.config)
File "/data/leoprojects/github/rag-chatbot/.venv/lib/python3.10/site-packages/llama_cpp/llama.py", line 338, in __init__
self._model = _LlamaModel(
File "/data/leoprojects/github/rag-chatbot/.venv/lib/python3.10/site-packages/llama_cpp/_internals.py", line 57, in __init__
raise ValueError(f"Failed to load model from file: {path_model}")
ValueError: Failed to load model from file: /data/leoprojects/github/rag-chatbot/models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
Stack (most recent call last):
File "/data/systems/miniconda3/envs/chat-box-poc/lib/python3.10/threading.py", line 973, in _bootstrap
self._bootstrap_inner()
File "/data/systems/miniconda3/envs/chat-box-poc/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/data/systems/miniconda3/envs/chat-box-poc/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/data/leoprojects/github/rag-chatbot/.venv/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 286, in _run_script_thread
self._run_script(request.rerun_data)
File "/data/leoprojects/github/rag-chatbot/.venv/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 534, in _run_script
exec(code, module.__dict__)
File "/data/leoprojects/github/rag-chatbot/chatbot/chatbot_app.py", line 167, in <module>
logger.error(f"An error occurred: {str(error)}", exc_info=True, stack_info=True)
by the way
streamlit run chatbot/chatbot_app.py -- --model openchat-3.6 --max-new-tokens 1024
it worked as expected though, so it's llama3 causing issues?
I have passed make setup_cuda and make update.
Why have you run make update
? Can you try to clean the environment running make clean
and then just make setup_cuda
?
it worked as expected though, so it's llama3 causing issues?
As far as I remember llama3 was working with that lama_ccp version. Let me try on my side.
BTW is still interesting that the installation fails with the latest lama_cpp
version on your side.
by the way streamlit run chatbot/chatbot_app.py -- --model openchat-3.6 --max-new-tokens 1024 it worked as expected though, so it's llama3 causing issues?
Yeah, I confirm that llama_cpp_python==0.2.76
supports llama 3.1
.
I do think that running make update
screwed up your environment.
@leowenlu about the initial error you got using the more updated lama_cpp version, it seems there is an open issue on the official repo.
I decided to roll back to 0.2.76
until newer versions are more stable.
I have upgrade llama_cpp_python==0.2.85
, it looks like I am able to get llama3 working now.
Thanks for your help, looking forward to more beautiful codes and more release from this project.
Very well done.
@leowenlu how did you make it work ? Did you upgrade with CMAKE_ARGS="-DGGML_CUDA=on" and you are using the GPU with llama_cpp_python==0.2.85 with llama3.1 ?
CMAKE_ARGS="-DGGML_CUDA=on"
and llama_cpp_python==0.2.85 with llama3.1.
now it looks like working.
@bouajajais
nvcc --version
Python 3.10.14
Poetry (version 1.7.0)
I am getting the following errors, any clue?
note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (llama-cpp-python)