Open harnalashok opened 6 months ago
The terminal open-close can't matter. Probably it's not compiling the cuda version and you are only getting the CPU version. Can you give an expanded full version of your error from 6-7?
I have repeated the experiment three times. The behavior is the same as narrated by me before that there is a need to unexport
one of the variables which have been made in paragraph A. above before I execute instruction numbered as 6 . Here is the complete trace of execution if I do not close the terminal but continue to work in the same terminal. (But if I close and open the terminal again, the process exceeds successfully. Please those results also below)
`(base) ashok@ashok:~/h2ogpt$ pip install -r reqs_optional/requirements_optional_llamacpp_gpt4all.txt --no-cache-dir Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu121, https://huggingface.github.io/autogptq-index/whl/cu121 Collecting gpt4all==1.0.5 (from -r reqs_optional/requirements_optional_llamacpp_gpt4all.txt (line 1)) Downloading gpt4all-1.0.5-py3-none-manylinux1_x86_64.whl.metadata (912 bytes) Collecting llama-cpp-python==0.2.56 (from -r reqs_optional/requirements_optional_llamacpp_gpt4all.txt (line 4)) Downloading llama_cpp_python-0.2.56.tar.gz (36.9 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 36.9/36.9 MB 28.8 MB/s eta 0:00:00 Installing build dependencies ... done Getting requirements to build wheel ... done Installing backend dependencies ... done Preparing metadata (pyproject.toml) ... done Requirement already satisfied: requests in /home/ashok/anaconda3/lib/python3.11/site-packages (from gpt4all==1.0.5->-r reqs_optional/requirements_optional_llamacpp_gpt4all.txt (line 1)) (2.31.0) Requirement already satisfied: tqdm in /home/ashok/anaconda3/lib/python3.11/site-packages (from gpt4all==1.0.5->-r reqs_optional/requirements_optional_llamacpp_gpt4all.txt (line 1)) (4.66.4) Requirement already satisfied: typing-extensions>=4.5.0 in /home/ashok/anaconda3/lib/python3.11/site-packages (from llama-cpp-python==0.2.56->-r reqs_optional/requirements_optional_llamacpp_gpt4all.txt (line 4)) (4.9.0) Requirement already satisfied: numpy>=1.20.0 in /home/ashok/anaconda3/lib/python3.11/site-packages (from llama-cpp-python==0.2.56->-r reqs_optional/requirements_optional_llamacpp_gpt4all.txt (line 4)) (1.26.4) Collecting diskcache>=5.6.1 (from llama-cpp-python==0.2.56->-r reqs_optional/requirements_optional_llamacpp_gpt4all.txt (line 4)) Downloading diskcache-5.6.3-py3-none-any.whl.metadata (20 kB) Requirement already satisfied: jinja2>=2.11.3 in /home/ashok/anaconda3/lib/python3.11/site-packages (from llama-cpp-python==0.2.56->-r reqs_optional/requirements_optional_llamacpp_gpt4all.txt (line 4)) (3.1.3) Requirement already satisfied: MarkupSafe>=2.0 in /home/ashok/anaconda3/lib/python3.11/site-packages (from jinja2>=2.11.3->llama-cpp-python==0.2.56->-r reqs_optional/requirements_optional_llamacpp_gpt4all.txt (line 4)) (2.1.3) Requirement already satisfied: charset-normalizer<4,>=2 in /home/ashok/anaconda3/lib/python3.11/site-packages (from requests->gpt4all==1.0.5->-r reqs_optional/requirements_optional_llamacpp_gpt4all.txt (line 1)) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in /home/ashok/anaconda3/lib/python3.11/site-packages (from requests->gpt4all==1.0.5->-r reqs_optional/requirements_optional_llamacpp_gpt4all.txt (line 1)) (3.4) Requirement already satisfied: urllib3<3,>=1.21.1 in /home/ashok/anaconda3/lib/python3.11/site-packages (from requests->gpt4all==1.0.5->-r reqs_optional/requirements_optional_llamacpp_gpt4all.txt (line 1)) (2.0.7) Requirement already satisfied: certifi>=2017.4.17 in /home/ashok/anaconda3/lib/python3.11/site-packages (from requests->gpt4all==1.0.5->-r reqs_optional/requirements_optional_llamacpp_gpt4all.txt (line 1)) (2024.2.2) Downloading gpt4all-1.0.5-py3-none-manylinux1_x86_64.whl (3.6 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.6/3.6 MB 9.1 MB/s eta 0:00:00 Downloading diskcache-5.6.3-py3-none-any.whl (45 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45.5/45.5 kB 34.3 MB/s eta 0:00:00 Building wheels for collected packages: llama-cpp-python Building wheel for llama-cpp-python (pyproject.toml) ... error error: subprocess-exited-with-error
× Building wheel for llama-cpp-python (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [45 lines of output] scikit-build-core 0.9.4 using CMake 3.29.3 (wheel) Configuring CMake... 2024-05-22 09:52:26,643 - scikit_build_core - WARNING - Can't find a Python library, got libdir=/home/ashok/anaconda3/lib, ldlibrary=libpython3.11.a, multiarch=x86_64-linux-gnu, masd=None loading initial cache file /tmp/tmpnyigg61a/build/CMakeInit.txt -- The C compiler identification is GNU 11.4.0 -- The CXX compiler identification is GNU 11.4.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/cc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found Git: /usr/bin/git (found version "2.34.1") -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Could not find nvcc, please set CUDAToolkit_ROOT. CMake Warning at vendor/llama.cpp/CMakeLists.txt:407 (message): cuBLAS not found
-- CUDA host compiler is GNU
CMake Error at vendor/llama.cpp/CMakeLists.txt:835 (get_flags):
get_flags Function invoked with incorrect arguments for function named:
get_flags
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with LLAMA_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
CMake Warning (dev) at CMakeLists.txt:21 (install):
Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
This warning is for project developers. Use -Wno-dev to suppress it.
CMake Warning (dev) at CMakeLists.txt:30 (install):
Target llama has PUBLIC_HEADER files but no PUBLIC_HEADER DESTINATION.
This warning is for project developers. Use -Wno-dev to suppress it.
-- Configuring incomplete, errors occurred!
*** CMake configuration failed
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for llama-cpp-python Failed to build llama-cpp-python ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects ` Here is what happens if I execute instruction numbered as 6 after I close and open the terminal. No error:
pip install -r reqs_optional/requirements_optional_llamacpp_gpt4all.txt --no-cache-dir Collecting gpt4all==1.0.5 (from -r reqs_optional/requirements_optional_llamacpp_gpt4all.txt (line 1)) Downloading gpt4all-1.0.5-py3-none-manylinux1_x86_64.whl.metadata (912 bytes) Collecting llama-cpp-python==0.2.56 (from -r reqs_optional/requirements_optional_llamacpp_gpt4all.txt (line 4)) Downloading llama_cpp_python-0.2.56.tar.gz (36.9 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 36.9/36.9 MB 23.2 MB/s eta 0:00:00 Installing build dependencies ... done Getting requirements to build wheel ... done Installing backend dependencies ... done Preparing metadata (pyproject.toml) ... done Requirement already satisfied: requests in /home/ashok/anaconda3/lib/python3.11/site-packages (from gpt4all==1.0.5->-r reqs_optional/requirements_optional_llamacpp_gpt4all.txt (line 1)) (2.31.0) Requirement already satisfied: tqdm in /home/ashok/anaconda3/lib/python3.11/site-packages (from gpt4all==1.0.5->-r reqs_optional/requirements_optional_llamacpp_gpt4all.txt (line 1)) (4.66.4) Requirement already satisfied: typing-extensions>=4.5.0 in /home/ashok/anaconda3/lib/python3.11/site-packages (from llama-cpp-python==0.2.56->-r reqs_optional/requirements_optional_llamacpp_gpt4all.txt (line 4)) (4.9.0) Requirement already satisfied: numpy>=1.20.0 in /home/ashok/anaconda3/lib/python3.11/site-packages (from llama-cpp-python==0.2.56->-r reqs_optional/requirements_optional_llamacpp_gpt4all.txt (line 4)) (1.26.4) Collecting diskcache>=5.6.1 (from llama-cpp-python==0.2.56->-r reqs_optional/requirements_optional_llamacpp_gpt4all.txt (line 4)) Downloading diskcache-5.6.3-py3-none-any.whl.metadata (20 kB) Requirement already satisfied: jinja2>=2.11.3 in /home/ashok/anaconda3/lib/python3.11/site-packages (from llama-cpp-python==0.2.56->-r reqs_optional/requirements_optional_llamacpp_gpt4all.txt (line 4)) (3.1.3) Requirement already satisfied: MarkupSafe>=2.0 in /home/ashok/anaconda3/lib/python3.11/site-packages (from jinja2>=2.11.3->llama-cpp-python==0.2.56->-r reqs_optional/requirements_optional_llamacpp_gpt4all.txt (line 4)) (2.1.3) Requirement already satisfied: charset-normalizer<4,>=2 in /home/ashok/anaconda3/lib/python3.11/site-packages (from requests->gpt4all==1.0.5->-r reqs_optional/requirements_optional_llamacpp_gpt4all.txt (line 1)) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in /home/ashok/anaconda3/lib/python3.11/site-packages (from requests->gpt4all==1.0.5->-r reqs_optional/requirements_optional_llamacpp_gpt4all.txt (line 1)) (3.4) Requirement already satisfied: urllib3<3,>=1.21.1 in /home/ashok/anaconda3/lib/python3.11/site-packages (from requests->gpt4all==1.0.5->-r reqs_optional/requirements_optional_llamacpp_gpt4all.txt (line 1)) (2.0.7) Requirement already satisfied: certifi>=2017.4.17 in /home/ashok/anaconda3/lib/python3.11/site-packages (from requests->gpt4all==1.0.5->-r reqs_optional/requirements_optional_llamacpp_gpt4all.txt (line 1)) (2024.2.2) Downloading gpt4all-1.0.5-py3-none-manylinux1_x86_64.whl (3.6 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.6/3.6 MB 25.7 MB/s eta 0:00:00 Downloading diskcache-5.6.3-py3-none-any.whl (45 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45.5/45.5 kB 31.4 MB/s eta 0:00:00 Building wheels for collected packages: llama-cpp-python Building wheel for llama-cpp-python (pyproject.toml) ... done Created wheel for llama-cpp-python: filename=llama_cpp_python-0.2.56-cp311-cp311-linux_x86_64.whl size=2827201 sha256=07293d75ff82ed6104572cae4fae96fc4fbb0f896b05211463ffd296aab81204 Stored in directory: /tmp/pip-ephem-wheel-cache-dxt7ajop/wheels/f5/48/62/014b1a3c38f77df21219f81ed63ca4c09531d52a205b15d8e4 Successfully built llama-cpp-python Installing collected packages: diskcache, llama-cpp-python, gpt4all Successfully installed diskcache-5.6.3 gpt4all-1.0.5 llama-cpp-python-0.2.56
I see the - Could not find nvcc, please set CUDAToolkit_ROOT.
and cuBLAS not found
that means something is wrong with the cuda installation.
Try again installing cuda 12.1 and ensure CUDA_HOME is set etc.
Installing cuda12.1 solaves the issue. But then I face another error when I execute python generate.py. Here is the complete trace. Kindly help:
`
python generate.py --base_model=TheBloke/Mistral-7B-Instruct-v0.2-GGUF --prompt_type=mistral --max_seq_len=4096
/home/ashok/anaconda3/lib/python3.11/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
soundfile, librosa, and wavio not installed, disabling STT
soundfile, librosa, and wavio not installed, disabling TTS
Using Model llama
load INSTRUCTOR_Transformer
max_seq_length 512
Must install DocTR and LangChain installed if enabled DocTR, disabling
Starting get_model: llama
Failed to listen to n_gpus: No module named 'llama_cpp_cuda', trying llama_cpp module
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce GTX 1060 with Max-Q Design, compute capability 6.1, VMM: yes
llama_model_loader: loaded meta data with 24 key-value pairs and 291 tensors from llamacpp_path/mistral-7b-instruct-v0.2.Q5_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = mistralai_mistral-7b-instruct-v0.2
llama_model_loader: - kv 2: llama.context_length u32 = 32768
llama_model_loader: - kv 3: llama.embedding_length u32 = 4096
llama_model_loader: - kv 4: llama.block_count u32 = 32
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10: llama.rope.freq_base f32 = 1000000.000000
llama_model_loader: - kv 11: general.file_type u32 = 17
llama_model_loader: - kv 12: tokenizer.ggml.model str = llama
llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["", "", "<0x00>", "<...
llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 19: tokenizer.ggml.padding_token_id u32 = 0
llama_model_loader: - kv 20: tokenizer.ggml.add_bos_token bool = true
llama_model_loader: - kv 21: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 22: tokenizer.chat_template str = {{ bos_token }}{% for message in mess...
llama_model_loader: - kv 23: general.quantization_version u32 = 2
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q5_K: 193 tensors
llama_model_loader: - type q6_K: 33 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 32768
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 1000000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 32768
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = Q5_K - Medium
llm_load_print_meta: model params = 7.24 B
llm_load_print_meta: model size = 4.78 GiB (5.67 BPW)
llm_load_print_meta: general.name = mistralai_mistral-7b-instruct-v0.2
llm_load_print_meta: BOS token = 1 ''
llm_load_print_meta: EOS token = 2 ''
llm_load_print_meta: UNK token = 0 '", "", "<0x00>", "<...
llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 19: tokenizer.ggml.padding_token_id u32 = 0
llama_model_loader: - kv 20: tokenizer.ggml.add_bos_token bool = true
llama_model_loader: - kv 21: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 22: tokenizer.chat_template str = {{ bos_token }}{% for message in mess...
llama_model_loader: - kv 23: general.quantization_version u32 = 2
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q5_K: 193 tensors
llama_model_loader: - type q6_K: 33 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 32768
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 1000000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 32768
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = Q5_K - Medium
llm_load_print_meta: model params = 7.24 B
llm_load_print_meta: model size = 4.78 GiB (5.67 BPW)
llm_load_print_meta: general.name = mistralai_mistral-7b-instruct-v0.2
llm_load_print_meta: BOS token = 1 ''
llm_load_print_meta: EOS token = 2 ''
llm_load_print_meta: UNK token = 0 '", "", "<0x00>", "<...
llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 19: tokenizer.ggml.padding_token_id u32 = 0
llama_model_loader: - kv 20: tokenizer.ggml.add_bos_token bool = true
llama_model_loader: - kv 21: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 22: tokenizer.chat_template str = {{ bos_token }}{% for message in mess...
llama_model_loader: - kv 23: general.quantization_version u32 = 2
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q5_K: 193 tensors
llama_model_loader: - type q6_K: 33 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 32768
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 1000000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 32768
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = Q5_K - Medium
llm_load_print_meta: model params = 7.24 B
llm_load_print_meta: model size = 4.78 GiB (5.67 BPW)
llm_load_print_meta: general.name = mistralai_mistral-7b-instruct-v0.2
llm_load_print_meta: BOS token = 1 ''
llm_load_print_meta: EOS token = 2 ''
llm_load_print_meta: UNK token = 0 '", "", "<0x00>", "<...
llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 19: tokenizer.ggml.padding_token_id u32 = 0
llama_model_loader: - kv 20: tokenizer.ggml.add_bos_token bool = true
llama_model_loader: - kv 21: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 22: tokenizer.chat_template str = {{ bos_token }}{% for message in mess...
llama_model_loader: - kv 23: general.quantization_version u32 = 2
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q5_K: 193 tensors
llama_model_loader: - type q6_K: 33 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 32768
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 1000000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 32768
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = Q5_K - Medium
llm_load_print_meta: model params = 7.24 B
llm_load_print_meta: model size = 4.78 GiB (5.67 BPW)
llm_load_print_meta: general.name = mistralai_mistral-7b-instruct-v0.2
llm_load_print_meta: BOS token = 1 ''
llm_load_print_meta: EOS token = 2 ''
llm_load_print_meta: UNK token = 0 '
`
It means it can't find the file or the file is corrupt.
This command works for me:
python generate.py --base_model=TheBloke/Mistral-7B-Instruct-v0.2-GGUF --prompt_type=mistral --max_seq_len=4096
that you shared.
I deleted my llamacpp_path folder and tried again, and it downloads fine, and is then used correctly.
Maybe at some point in past you got corrupted incomplete version of the file.
Please delete the file llamacpp_path/mistral-7b-instruct-v0.2.Q5_K_M.gguf
and try again. Or try to use that file with llama.cpp directly and see if that works. If it does work with llama.cpp, then I'm confused.
The
h2ogpt
linux installation method as given here is as follows:A. Variable export instructions:
export PIP_EXTRA_INDEX_URL="https://download.pytorch.org/whl/cu118 https://huggingface.github.io/autogptq-index/whl/cu118"
export LLAMA_CUBLAS=1
export CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCMAKE_CUDA_ARCHITECTURES=all"
export FORCE_CMAKE=1
B. Then, one is required to run the following seven instructions
(numbers are given by me)
1. git clone https://github.com/h2oai/h2ogpt.git
2. cd h2ogpt
3. pip install -r requirements.txt
4. pip install -r reqs_optional/requirements_optional_langchain.txt
5. pip uninstall llama_cpp_python llama_cpp_python_cuda -y
6. pip install -r reqs_optional/requirements_optional_llamacpp_gpt4all.txt --no-cache-dir
7. pip install -r reqs_optional/requirements_optional_langchain.urls.txt
Executing instruction 6 and 7 results in the following error:
-- Configuring incomplete, errors occurred!
*** CMake configuration failed
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects
However, this error, is avoided if after executing instructions 1 to 4, the terminal is closed and then re-opened again. And then instructions 6 and 7 are executed. Meaning thereby, variable export instructions issued earlier result in generation of error in the execution of instructions 6 and 7.
This may please be rechecked at your end and installation document corrected accordingly. Ashok Kumar Harnal