oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
40.83k stars 5.34k forks source link

Exception: Cannot import 'llama-cpp-cuda' because 'llama-cpp' is already imported. Switching to a different version of llama-cpp-python currently requires a server restart. #6235

Open dark-passages opened 4 months ago

dark-passages commented 4 months ago

Describe the bug

After downloading a model I try to load it but I get this message on the console: Exception: Cannot import 'llama-cpp-cuda' because 'llama-cpp' is already imported. Switching to a different version of llama-cpp-python currently requires a server restart.

Is there an existing issue for this?

Reproduction

Install Ubuntu 24.04 Update & upgrade clone git-repo bash start_linux.sh --listen --gradio-auth user:passw open the site in a browser select any GGUF model, download it refresh the list of models select the GGUF model press load

Screenshot

image screencapture-192-168-178-8-7860-2024-07-14-23_10_59

Logs

INFO Loading "yarn-mistral-7b-128k.Q4_K_M.gguf"
INFO llama.cpp weights detected: "models/yarn-mistral-7b-128k.Q4_K_M.gguf"
ERROR Failed to load the model.
Traceback (most recent call last):
 File "/home/arjan/oobabooga/text -generation-webui-main/modules/ui-model-menu.py", line 248, in load-model-wrapper
  shared.model, shared.tokenizer = load_model(selected_model, loader)

 File "/home/arjan/oobabooga/text-generation-webui-main/modules/models.py", line 94, in load-model
  output = load_func_map [loader] (model_name)

 File "/home/arjan/oobabooga/text-generation-webui-main/modules/models.py", line 275, in llamacpp_loader
  model, tokenizer = LlamaCppModel.from_pretrained(model_file)

line 34, in load-model

 File "/home/arjan/oobabooga/text-generation-webui-main/modules/llamacpp_model.py", line 39, in from_pretrained
  LlamaCache = llama_cpp_lib().LlamaCache

 File "/home/arjan/oobabooga/text-generation-webui-main/modules/llama_cpp_pgthon_hijack.py", line 39, in llama_cpp_lib
  raise Exception(f"Cannot import '{lib_name}' because '{imported_module}' is already imported. Switching to a different version of llama-cpp-python currently requires a server restart.")
Exception: Cannot import 'llama_cpp_cuda' because 'llama_cpp' is already imported. Switching to a different version of llama-cpp-python currently requires a server restart.

System Info

Virtual machine running on Proxmox
CPU: 4
Memory: 30720 MB
Disk: 60 GB
GPU: none
Ubuntu 24.04
Remowylliams commented 4 months ago

I experience this problem when I switch from CPU only mode to GPU offloading, Going from GPU offloading to CPU Only usually works fine. In all cases I recommend unloading your model first. As for how to fix it. use ^c in your console window, and restart the server Should it work back and forth .. yes It seems to have in the past but I won't put money on it.

dark-passages commented 4 months ago

I clicked on the unload button several times, i restarted the server several times, even installed everything on a clean VM more than once. Never tried switching from CPU only to GPU mode, because there is no supported GPU in the VM. How can I switch to GPU offloading without a supported GPU?

zba commented 4 months ago

How can I switch to GPU offloading without a supported GPU?

I guess no way, if you have gpu - enable it for VM, if not - use CPU. :)

mykeehu commented 3 months ago

I have the same problem when I enable the "--tensorcores" option. I cannot use it even though I have an RTX 3090 card. What is the problem?

07:54:11-020419 ERROR    Failed to load the model.
Traceback (most recent call last):
  File "I:\oobabooga_windows\text-generation-webui\modules\ui_model_menu.py", line 231, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(selected_model, loader)
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\oobabooga_windows\text-generation-webui\modules\models.py", line 93, in load_model
    output = load_func_map[loader](model_name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\oobabooga_windows\text-generation-webui\modules\models.py", line 274, in llamacpp_loader
    model, tokenizer = LlamaCppModel.from_pretrained(model_file)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\oobabooga_windows\text-generation-webui\modules\llamacpp_model.py", line 38, in from_pretrained
    Llama = llama_cpp_lib().Llama
            ^^^^^^^^^^^^^^^
  File "I:\oobabooga_windows\text-generation-webui\modules\llama_cpp_python_hijack.py", line 39, in llama_cpp_lib
    raise Exception(f"Cannot import `{lib_name}` because `{imported_module}` is already imported. Switching to a different version of llama-cpp-python currently requires a server restart.")
Exception: Cannot import `llama_cpp_cuda` because `llama_cpp_cuda_tensorcores` is already imported. Switching to a different version of llama-cpp-python currently requires a server restart.

Update1: I have the solution. Because if I use the "--tensorecores" flag, it is not automatically checked in on the model data sheet, overwriting the previous saved config. If I tick it there too, it should be fine.

b-risk commented 2 months ago

Same issue on Arch, I am using CPU mode.

XJF2332 commented 2 months ago

similar problem, sometimes I can't use tensor cores, and sometimes I have to use tensor cores.