Open dark-passages opened 4 months ago
I experience this problem when I switch from CPU only mode to GPU offloading, Going from GPU offloading to CPU Only usually works fine. In all cases I recommend unloading your model first. As for how to fix it. use ^c in your console window, and restart the server Should it work back and forth .. yes It seems to have in the past but I won't put money on it.
I clicked on the unload button several times, i restarted the server several times, even installed everything on a clean VM more than once. Never tried switching from CPU only to GPU mode, because there is no supported GPU in the VM. How can I switch to GPU offloading without a supported GPU?
How can I switch to GPU offloading without a supported GPU?
I guess no way, if you have gpu - enable it for VM, if not - use CPU. :)
I have the same problem when I enable the "--tensorcores" option. I cannot use it even though I have an RTX 3090 card. What is the problem?
07:54:11-020419 ERROR Failed to load the model.
Traceback (most recent call last):
File "I:\oobabooga_windows\text-generation-webui\modules\ui_model_menu.py", line 231, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "I:\oobabooga_windows\text-generation-webui\modules\models.py", line 93, in load_model
output = load_func_map[loader](model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "I:\oobabooga_windows\text-generation-webui\modules\models.py", line 274, in llamacpp_loader
model, tokenizer = LlamaCppModel.from_pretrained(model_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "I:\oobabooga_windows\text-generation-webui\modules\llamacpp_model.py", line 38, in from_pretrained
Llama = llama_cpp_lib().Llama
^^^^^^^^^^^^^^^
File "I:\oobabooga_windows\text-generation-webui\modules\llama_cpp_python_hijack.py", line 39, in llama_cpp_lib
raise Exception(f"Cannot import `{lib_name}` because `{imported_module}` is already imported. Switching to a different version of llama-cpp-python currently requires a server restart.")
Exception: Cannot import `llama_cpp_cuda` because `llama_cpp_cuda_tensorcores` is already imported. Switching to a different version of llama-cpp-python currently requires a server restart.
Update1: I have the solution. Because if I use the "--tensorecores" flag, it is not automatically checked in on the model data sheet, overwriting the previous saved config. If I tick it there too, it should be fine.
Same issue on Arch, I am using CPU mode.
similar problem, sometimes I can't use tensor cores, and sometimes I have to use tensor cores.
Describe the bug
After downloading a model I try to load it but I get this message on the console: Exception: Cannot import 'llama-cpp-cuda' because 'llama-cpp' is already imported. Switching to a different version of llama-cpp-python currently requires a server restart.
Is there an existing issue for this?
Reproduction
Install Ubuntu 24.04 Update & upgrade clone git-repo bash start_linux.sh --listen --gradio-auth user:passw open the site in a browser select any GGUF model, download it refresh the list of models select the GGUF model press load
Screenshot
Logs
System Info