I am having an issue while trying to load a 7B model by TheBloke/Xwin-MLewd-7B-V0.2-GPTQ. although the 13B version model did load correctly and I had no issue with it, except that the response time is a little bit slow for my setup. but when I tried the 7B version I kept receiving this runtime error.
I have used both Loaders ExLlamav2_HF and AutoGPTQ
with the ExLlamav2_HF:
lowering the max_seq_len from 4096 to 2048 did not make a difference.
Traceback (most recent call last):
File "D:\OOBABOOGA\text-generation-webui-main\modules\ui_model_menu.py", line 244, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\OOBABOOGA\text-generation-webui-main\modules\models.py", line 93, in load_model
output = load_func_map[loader](model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\OOBABOOGA\text-generation-webui-main\modules\models.py", line 325, in ExLlamav2_HF_loader
return Exllamav2HF.from_pretrained(model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\OOBABOOGA\text-generation-webui-main\modules\exllamav2_hf.py", line 181, in from_pretrained
return Exllamav2HF(config)
^^^^^^^^^^^^^^^^^^^
File "D:\OOBABOOGA\text-generation-webui-main\modules\exllamav2_hf.py", line 50, in init
self.ex_model.load(split)
File "D:\OOBABOOGA\text-generation-webui-main\installer_files\env\Lib\site-packages\exllamav2\model.py", line 332, in load
for item in f: x = item
File "D:\OOBABOOGA\text-generation-webui-main\installer_files\env\Lib\site-packages\exllamav2\model.py", line 355, in load_gen
module.load()
File "D:\OOBABOOGA\text-generation-webui-main\installer_files\env\Lib\site-packages\exllamav2\attn.py", line 254, in load
self.q_proj.load()
File "D:\OOBABOOGA\text-generation-webui-main\installer_files\env\Lib\site-packages\exllamav2\linear.py", line 109, in load
self.q_handle = ext.make_q_matrix(w,
^^^^^^^^^^^^^^^^^^^^
File "D:\OOBABOOGA\text-generation-webui-main\installer_files\env\Lib\site-packages\exllamav2\ext.py", line 247, in make_q_matrix
return ext_c.make_q_matrix(w["qweight"],
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Insufficient size of temp_dq buffer
with AutoGPTQ:
wbits: 4, groupsize: 128
Traceback (most recent call last):
File "D:\OOBABOOGA\text-generation-webui-main\modules\ui_model_menu.py", line 244, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\OOBABOOGA\text-generation-webui-main\modules\models.py", line 93, in load_model
output = load_func_map[loader](model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\OOBABOOGA\text-generation-webui-main\modules\models.py", line 312, in AutoGPTQ_loader
return modules.AutoGPTQ_loader.load_quantized(model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\OOBABOOGA\text-generation-webui-main\modules\AutoGPTQ_loader.py", line 59, in load_quantized
model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\OOBABOOGA\text-generation-webui-main\installer_files\env\Lib\site-packages\auto_gptq\modeling\auto.py", line 135, in from_quantized
return quant_func(
^^^^^^^^^^^
File "D:\OOBABOOGA\text-generation-webui-main\installer_files\env\Lib\site-packages\auto_gptq\modeling_base.py", line 1246, in from_quantized
accelerate.utils.modeling.load_checkpoint_in_model(
File "D:\OOBABOOGA\text-generation-webui-main\installer_files\env\Lib\site-packages\accelerate\utils\modeling.py", line 1736, in load_checkpoint_in_model
set_module_tensor_to_device(
File "D:\OOBABOOGA\text-generation-webui-main\installer_files\env\Lib\site-packages\accelerate\utils\modeling.py", line 358, in set_module_tensor_to_device
raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([32000, 5120]) in "weight" (which has shape torch.Size([32001, 4096])), this look incorrect.
I am having an issue while trying to load a 7B model by TheBloke/Xwin-MLewd-7B-V0.2-GPTQ. although the 13B version model did load correctly and I had no issue with it, except that the response time is a little bit slow for my setup. but when I tried the 7B version I kept receiving this runtime error. I have used both Loaders ExLlamav2_HF and AutoGPTQ
with the ExLlamav2_HF: lowering the max_seq_len from 4096 to 2048 did not make a difference.
with AutoGPTQ: wbits: 4, groupsize: 128