oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
38.24k stars 5.07k forks source link

AWQ model error: ERROR Failed to load the model. NotImplementedError: Cannot copy out of meta tensor; no data! #5736

Open guispfilho opened 3 months ago

guispfilho commented 3 months ago

Describe the bug

Tried two AWQ models and got the same type of error. It successfully download the model, but always breaks at 60~62% while "Fusing layers".

Is there an existing issue for this?

Reproduction

text-generation-webui correctly downloads any model, but the error occurs when clicking "Load" with a AWQ selected.

Screenshot

Captura de tela 2024-03-19 202913

Logs

Replacing layers...: 100%|█████████████████████████████████████████████████████████████| 40/40 [00:04<00:00,  8.05it/s]
Fusing layers...:  60%|██████████████████████████████████████▍                         | 24/40 [00:08<00:05,  2.96it/s]
20:22:32-700933 ERROR    Failed to load the model.
Traceback (most recent call last):
  File "C:\Users\guisp\Arquivos\Fooocus_win64_2-1-831\text-generation-webui-main\modules\ui_model_menu.py", line 245, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(selected_model, loader)
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\guisp\Arquivos\Fooocus_win64_2-1-831\text-generation-webui-main\modules\models.py", line 87, in load_model
    output = load_func_map[loader](model_name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\guisp\Arquivos\Fooocus_win64_2-1-831\text-generation-webui-main\modules\models.py", line 302, in AutoAWQ_loader
    model = AutoAWQForCausalLM.from_quantized(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\guisp\Arquivos\Fooocus_win64_2-1-831\text-generation-webui-main\installer_files\env\Lib\site-packages\awq\models\auto.py", line 94, in from_quantized
    return AWQ_CAUSAL_LM_MODEL_MAP[model_type].from_quantized(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\guisp\Arquivos\Fooocus_win64_2-1-831\text-generation-webui-main\installer_files\env\Lib\site-packages\awq\models\base.py", line 440, in from_quantized
    self.fuse_layers(model)
  File "C:\Users\guisp\Arquivos\Fooocus_win64_2-1-831\text-generation-webui-main\installer_files\env\Lib\site-packages\awq\models\llama.py", line 21, in fuse_layers
    fuser.fuse_transformer()
  File "C:\Users\guisp\Arquivos\Fooocus_win64_2-1-831\text-generation-webui-main\installer_files\env\Lib\site-packages\awq\models\llama.py", line 117, in fuse_transformer
    LlamaLikeBlock(
  File "C:\Users\guisp\Arquivos\Fooocus_win64_2-1-831\text-generation-webui-main\installer_files\env\Lib\site-packages\awq\modules\fused\block.py", line 88, in __init__
    self.norm_1 = norm_1.to(dev)
                  ^^^^^^^^^^^^^^
  File "C:\Users\guisp\Arquivos\Fooocus_win64_2-1-831\text-generation-webui-main\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1152, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\guisp\Arquivos\Fooocus_win64_2-1-831\text-generation-webui-main\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 825, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "C:\Users\guisp\Arquivos\Fooocus_win64_2-1-831\text-generation-webui-main\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1150, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
NotImplementedError: Cannot copy out of meta tensor; no data!

System Info

cpu i5-13450
gpu rtx 3050
ram 16gb
win11
indigotechtutorials commented 3 months ago

same on mac

mailsonm commented 2 months ago

I have the same problem but on Linux, CPU: i7-11600H, GPU: RTX 3050