I'm having trouble loading models into WebUI. Using the guides on the web, I did as it was written and after starting WebUi, two different errors pop up depending on the models.
TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g
"2023-08-02 18:31:17 ERROR:Failed to load the model.
Traceback (most recent call last):
File "F:\oobabooga_windows\text-generation-webui\server.py", line 68, in load_model_wrapper
"WARNING:The safetensors archive passed at models\TheBloke_chronos-hermes-13B-GPTQ\chronos-hermes-13b-GPTQ-4bit-128g.no-act.order.safetensors does not contain metadata. Make sure to save your model with the save_pretrained method. Defaulting to 'pt' metadata."
I use RTX 4070 12GB Vram , 16 GB Ram, Intel i5-12400F
I'm having trouble loading models into WebUI. Using the guides on the web, I did as it was written and after starting WebUi, two different errors pop up depending on the models.
TheBloke/vicuna-13B-1.1-GPTQ-4bit-128g
"2023-08-02 18:31:17 ERROR:Failed to load the model.
Traceback (most recent call last):
File "F:\oobabooga_windows\text-generation-webui\server.py", line 68, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name, loader)
File "F:\oobabooga_windows\text-generation-webui\modules\models.py", line 78, in load_model
output = load_func_maploader
File "F:\oobabooga_windows\text-generation-webui\modules\models.py", line 287, in AutoGPTQ_loader
return modules.AutoGPTQ_loader.load_quantized(model_name)
File "F:\oobabooga_windows\text-generation-webui\modules\AutoGPTQ_loader.py", line 53, in load_quantized
model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params)
File "F:\oobabooga_windows\installer_files\env\lib\site-packages\auto_gptq\modeling\auto.py", line 94, in from_quantized
return quant_func(
File "F:\oobabooga_windows\installer_files\env\lib\site-packages\auto_gptq\modeling_base.py", line 793, in from_quantized
accelerate.utils.modeling.load_checkpoint_in_model(
File "F:\oobabooga_windows\installer_files\env\lib\site-packages\accelerate\utils\modeling.py", line 1336, in load_checkpoint_in_model
set_module_tensor_to_device(
File "F:\oobabooga_windows\installer_files\env\lib\site-packages\accelerate\utils\modeling.py", line 298, in set_module_tensor_to_device
new_value = value.to(device)
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions."TheBloke_chronos-hermes-13B-GPTQ\chronos-hermes-13b-GPTQ-4bit-128g
"WARNING:The safetensors archive passed at models\TheBloke_chronos-hermes-13B-GPTQ\chronos-hermes-13b-GPTQ-4bit-128g.no-act.order.safetensors does not contain metadata. Make sure to save your model with the save_pretrained method. Defaulting to 'pt' metadata."
I use RTX 4070 12GB Vram , 16 GB Ram, Intel i5-12400F
What am I doing wrong? How to fix it?