zylon-ai / private-gpt

Interact with your documents using the power of GPT, 100% privately, no data leaks
https://privategpt.dev
Apache License 2.0
53.51k stars 7.19k forks source link

Unable to start the application after update 0.5.0 #1856

Open ashunaveed opened 4 months ago

ashunaveed commented 4 months ago

i was able to start the application with 0.4.0 but when i try to start with 0.5.0, i am getting following output. Please help.

(gpt) C:\Users\genco\Desktop\docs\private-gpt-main>make run poetry run python -m private_gpt 17:48:27.791 [INFO ] private_gpt.settings.settings_loader - Starting application with profiles=['default', 'local'] 17:48:34.709 [INFO ] private_gpt.components.llm.llm_component - Initializing the LLM in mode=llamacpp llama_model_loader: loaded meta data with 24 key-value pairs and 291 tensors from C:\Users\genco\Desktop\docs\private-gpt-main\models\mistral-7b-instruct-v0.2.Q8_0.gguf (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = mistralai_mistral-7b-instruct-v0.2 llama_model_loader: - kv 2: llama.context_length u32 = 32768 llama_model_loader: - kv 3: llama.embedding_length u32 = 4096 llama_model_loader: - kv 4: llama.block_count u32 = 32 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336 llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 10: llama.rope.freq_base f32 = 1000000.000000 llama_model_loader: - kv 11: general.file_type u32 = 7 llama_model_loader: - kv 12: tokenizer.ggml.model str = llama llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["", "", "", "<0x00>", "<... llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000... llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1 llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2 llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0 llama_model_loader: - kv 19: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 20: tokenizer.ggml.add_bos_token bool = true llama_model_loader: - kv 21: tokenizer.ggml.add_eos_token bool = false llama_model_loader: - kv 22: tokenizer.chat_template str = {{ bos_token }}{% for message in mess... llama_model_loader: - kv 23: general.quantization_version u32 = 2 llama_model_loader: - type f32: 65 tensors llama_model_loader: - type q8_0: 226 tensors llm_load_vocab: special tokens definition check successful ( 259/32000 ). llm_load_print_meta: format = GGUF V3 (latest) llm_load_print_meta: arch = llama llm_load_print_meta: vocab type = SPM llm_load_print_meta: n_vocab = 32000 llm_load_print_meta: n_merges = 0 llm_load_print_meta: n_ctx_train = 32768 llm_load_print_meta: n_embd = 4096 llm_load_print_meta: n_head = 32 llm_load_print_meta: n_head_kv = 8 llm_load_print_meta: n_layer = 32 llm_load_print_meta: n_rot = 128 llm_load_print_meta: n_embd_head_k = 128 llm_load_print_meta: n_embd_head_v = 128 llm_load_print_meta: n_gqa = 4 llm_load_print_meta: n_embd_k_gqa = 1024 llm_load_print_meta: n_embd_v_gqa = 1024 llm_load_print_meta: f_norm_eps = 0.0e+00 llm_load_print_meta: f_norm_rms_eps = 1.0e-05 llm_load_print_meta: f_clamp_kqv = 0.0e+00 llm_load_print_meta: f_max_alibi_bias = 0.0e+00 llm_load_print_meta: f_logit_scale = 0.0e+00 llm_load_print_meta: n_ff = 14336 llm_load_print_meta: n_expert = 0 llm_load_print_meta: n_expert_used = 0 llm_load_print_meta: causal attn = 1 llm_load_print_meta: pooling type = 0 llm_load_print_meta: rope type = 0 llm_load_print_meta: rope scaling = linear llm_load_print_meta: freq_base_train = 1000000.0 llm_load_print_meta: freq_scale_train = 1 llm_load_print_meta: n_yarn_orig_ctx = 32768 llm_load_print_meta: rope_finetuned = unknown llm_load_print_meta: ssm_d_conv = 0 llm_load_print_meta: ssm_d_inner = 0 llm_load_print_meta: ssm_d_state = 0 llm_load_print_meta: ssm_dt_rank = 0 llm_load_print_meta: model type = 7B llm_load_print_meta: model ftype = Q8_0 llm_load_print_meta: model params = 7.24 B llm_load_print_meta: model size = 7.17 GiB (8.50 BPW) llm_load_print_meta: general.name = mistralai_mistral-7b-instruct-v0.2 llm_load_print_meta: BOS token = 1 '' llm_load_print_meta: EOS token = 2 '' llm_load_print_meta: UNK token = 0 '' llm_load_print_meta: PAD token = 0 '' llm_load_print_meta: LF token = 13 '<0x0A>' ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 4050 Laptop GPU, compute capability 8.9, VMM: yes llm_load_tensors: ggml ctx size = 0.22 MiB llm_load_tensors: offloading 32 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 33/33 layers to GPU llm_load_tensors: CPU buffer size = 132.81 MiB llm_load_tensors: CUDA0 buffer size = 7205.83 MiB ................................................................................................... llama_new_context_with_model: n_ctx = 32000 llama_new_context_with_model: n_batch = 512 llama_new_context_with_model: n_ubatch = 512 llama_new_context_with_model: freq_base = 1000000.0 llama_new_context_with_model: freq_scale = 1 llama_kv_cache_init: CUDA0 KV buffer size = 4000.00 MiB llama_new_context_with_model: KV self size = 4000.00 MiB, K (f16): 2000.00 MiB, V (f16): 2000.00 MiB llama_new_context_with_model: CUDA_Host output buffer size = 0.12 MiB llama_new_context_with_model: CUDA0 compute buffer size = 2094.50 MiB llama_new_context_with_model: CUDA_Host compute buffer size = 70.51 MiB llama_new_context_with_model: graph nodes = 1030 llama_new_context_with_model: graph splits = 2 AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | Model metadata: {'general.name': 'mistralai_mistral-7b-instruct-v0.2', 'general.architecture': 'llama', 'llama.context_length': '32768', 'llama.rope.dimension_count': '128', 'llama.embedding_length': '4096', 'llama.block_count': '32', 'llama.feed_forward_length': '14336', 'llama.attention.head_count': '32', 'tokenizer.ggml.eos_token_id': '2', 'general.file_type': '7', 'llama.attention.head_count_kv': '8', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'llama.rope.freq_base': '1000000.000000', 'tokenizer.ggml.model': 'llama', 'general.quantization_version': '2', 'tokenizer.ggml.bos_token_id': '1', 'tokenizer.ggml.unknown_token_id': '0', 'tokenizer.ggml.padding_token_id': '0', 'tokenizer.ggml.add_bos_token': 'true', 'tokenizer.ggml.add_eos_token': 'false', 'tokenizer.chat_template': "{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token}}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}"} Guessed chat format: mistral-instruct 17:48:39.584 [INFO ] private_gpt.components.embedding.embedding_component - Initializing the embedding model in mode=huggingface Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 3/3 [00:14<00:00, 4.98s/it] Traceback (most recent call last): File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\injector__init__.py", line 798, in get return self._context[key]


KeyError: <class 'private_gpt.ui.ui.PrivateGptUi'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\injector\__init__.py", line 798, in get
    return self._context[key]
           ~~~~~~~~~~~~~^^^^^
KeyError: <class 'private_gpt.server.ingest.ingest_service.IngestService'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\injector\__init__.py", line 798, in get
    return self._context[key]
           ~~~~~~~~~~~~~^^^^^
KeyError: <class 'private_gpt.components.embedding.embedding_component.EmbeddingComponent'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\genco\Desktop\docs\private-gpt-main\private_gpt\__main__.py", line 5, in <module>
    from private_gpt.main import app
  File "C:\Users\genco\Desktop\docs\private-gpt-main\private_gpt\main.py", line 6, in <module>
    app = create_app(global_injector)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\genco\Desktop\docs\private-gpt-main\private_gpt\launcher.py", line 63, in create_app
    ui = root_injector.get(PrivateGptUi)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\injector\__init__.py", line 91, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\injector\__init__.py", line 974, in get
    provider_instance = scope_instance.get(interface, binding.provider)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\injector\__init__.py", line 91, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\injector\__init__.py", line 800, in get
    instance = self._get_instance(key, provider, self.injector)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\injector\__init__.py", line 811, in _get_instance
    return provider.get(injector)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\injector\__init__.py", line 264, in get
    return injector.create_object(self._cls)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\injector\__init__.py", line 998, in create_object
    self.call_with_injection(init, self_=instance, kwargs=additional_kwargs)
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\injector\__init__.py", line 1031, in call_with_injection
    dependencies = self.args_to_inject(
                   ^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\injector\__init__.py", line 91, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\injector\__init__.py", line 1079, in args_to_inject
    instance: Any = self.get(interface)
                    ^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\injector\__init__.py", line 91, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\injector\__init__.py", line 974, in get
    provider_instance = scope_instance.get(interface, binding.provider)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\injector\__init__.py", line 91, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\injector\__init__.py", line 800, in get
    instance = self._get_instance(key, provider, self.injector)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\injector\__init__.py", line 811, in _get_instance
    return provider.get(injector)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\injector\__init__.py", line 264, in get
    return injector.create_object(self._cls)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\injector\__init__.py", line 998, in create_object
    self.call_with_injection(init, self_=instance, kwargs=additional_kwargs)
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\injector\__init__.py", line 1031, in call_with_injection
    dependencies = self.args_to_inject(
                   ^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\injector\__init__.py", line 91, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\injector\__init__.py", line 1079, in args_to_inject
    instance: Any = self.get(interface)
                    ^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\injector\__init__.py", line 91, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\injector\__init__.py", line 974, in get
    provider_instance = scope_instance.get(interface, binding.provider)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\injector\__init__.py", line 91, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\injector\__init__.py", line 800, in get
    instance = self._get_instance(key, provider, self.injector)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\injector\__init__.py", line 811, in _get_instance
    return provider.get(injector)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\injector\__init__.py", line 264, in get
    return injector.create_object(self._cls)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\injector\__init__.py", line 998, in create_object
    self.call_with_injection(init, self_=instance, kwargs=additional_kwargs)
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\injector\__init__.py", line 1040, in call_with_injection
    return callable(*full_args, **dependencies)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\genco\Desktop\docs\private-gpt-main\private_gpt\components\embedding\embedding_component.py", line 31, in __init__
    self.embedding_model = HuggingFaceEmbedding(
                           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\llama_index\embeddings\huggingface\base.py", line 87, in __init__
    self._model = model.to(self._device)
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\transformers\modeling_utils.py", line 2556, in to
    return super().to(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\torch\nn\modules\module.py", line 1152, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\torch\nn\modules\module.py", line 802, in _apply
    module._apply(fn)
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\torch\nn\modules\module.py", line 802, in _apply
    module._apply(fn)
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\torch\nn\modules\module.py", line 802, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\torch\nn\modules\module.py", line 825, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\gpt\Lib\site-packages\torch\nn\modules\module.py", line 1150, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB. GPU 0 has a total capacity of 6.00 GiB of which 0 bytes is free. Of the allocated memory 23.36 GiB is allocated by PyTorch, and 1.14 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
make: *** [Makefile:36: run] Error 1
ashunaveed commented 4 months ago

Even if i try set PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True, i am getting same error