oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
40.82k stars 5.34k forks source link

Insufficient size of temp_dq buffer when loading model #6493

Open RedCodedWizard opened 1 month ago

RedCodedWizard commented 1 month ago

I am having an issue while trying to load a 7B model by TheBloke/Xwin-MLewd-7B-V0.2-GPTQ. although the 13B version model did load correctly and I had no issue with it, except that the response time is a little bit slow for my setup. but when I tried the 7B version I kept receiving this runtime error. I have used both Loaders ExLlamav2_HF and AutoGPTQ

with the ExLlamav2_HF: lowering the max_seq_len from 4096 to 2048 did not make a difference.

Traceback (most recent call last):

File "D:\OOBABOOGA\text-generation-webui-main\modules\ui_model_menu.py", line 244, in load_model_wrapper

shared.model, shared.tokenizer = load_model(selected_model, loader)

                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\OOBABOOGA\text-generation-webui-main\modules\models.py", line 93, in load_model

output = load_func_map[loader](model_name)

         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\OOBABOOGA\text-generation-webui-main\modules\models.py", line 325, in ExLlamav2_HF_loader

return Exllamav2HF.from_pretrained(model_name)

       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\OOBABOOGA\text-generation-webui-main\modules\exllamav2_hf.py", line 181, in from_pretrained

return Exllamav2HF(config)

       ^^^^^^^^^^^^^^^^^^^
File "D:\OOBABOOGA\text-generation-webui-main\modules\exllamav2_hf.py", line 50, in init

self.ex_model.load(split)
File "D:\OOBABOOGA\text-generation-webui-main\installer_files\env\Lib\site-packages\exllamav2\model.py", line 332, in load

for item in f: x = item
File "D:\OOBABOOGA\text-generation-webui-main\installer_files\env\Lib\site-packages\exllamav2\model.py", line 355, in load_gen

module.load()
File "D:\OOBABOOGA\text-generation-webui-main\installer_files\env\Lib\site-packages\exllamav2\attn.py", line 254, in load

self.q_proj.load()
File "D:\OOBABOOGA\text-generation-webui-main\installer_files\env\Lib\site-packages\exllamav2\linear.py", line 109, in load

self.q_handle = ext.make_q_matrix(w,

                ^^^^^^^^^^^^^^^^^^^^
File "D:\OOBABOOGA\text-generation-webui-main\installer_files\env\Lib\site-packages\exllamav2\ext.py", line 247, in make_q_matrix

return ext_c.make_q_matrix(w["qweight"],

       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Insufficient size of temp_dq buffer

with AutoGPTQ: wbits: 4, groupsize: 128

Traceback (most recent call last):

File "D:\OOBABOOGA\text-generation-webui-main\modules\ui_model_menu.py", line 244, in load_model_wrapper

shared.model, shared.tokenizer = load_model(selected_model, loader)

                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\OOBABOOGA\text-generation-webui-main\modules\models.py", line 93, in load_model

output = load_func_map[loader](model_name)

         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\OOBABOOGA\text-generation-webui-main\modules\models.py", line 312, in AutoGPTQ_loader

return modules.AutoGPTQ_loader.load_quantized(model_name)

       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\OOBABOOGA\text-generation-webui-main\modules\AutoGPTQ_loader.py", line 59, in load_quantized

model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params)

        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\OOBABOOGA\text-generation-webui-main\installer_files\env\Lib\site-packages\auto_gptq\modeling\auto.py", line 135, in from_quantized

return quant_func(

       ^^^^^^^^^^^
File "D:\OOBABOOGA\text-generation-webui-main\installer_files\env\Lib\site-packages\auto_gptq\modeling_base.py", line 1246, in from_quantized

accelerate.utils.modeling.load_checkpoint_in_model(
File "D:\OOBABOOGA\text-generation-webui-main\installer_files\env\Lib\site-packages\accelerate\utils\modeling.py", line 1736, in load_checkpoint_in_model

set_module_tensor_to_device(
File "D:\OOBABOOGA\text-generation-webui-main\installer_files\env\Lib\site-packages\accelerate\utils\modeling.py", line 358, in set_module_tensor_to_device

raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([32000, 5120]) in "weight" (which has shape torch.Size([32001, 4096])), this look incorrect.