oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
40.63k stars 5.31k forks source link

RuntimeError: "LayerNormKernelImpl" not implemented for 'Half' #1172

Closed Jake36921 closed 1 year ago

Jake36921 commented 1 year ago

Describe the bug

Tried to generate response but no output generated.

Is there an existing issue for this?

Reproduction

Arguments: call python server.py --chat --model-dir models --cpu --wbits 4 --groupsize 128 run bat file, wait for the model to load, and then click generate.

Screenshot

No response

Logs

Starting the web UI...

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: Loading binary E:\Games\oobabooga-windows\installer_files\env\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.dll...
E:\Games\oobabooga-windows\installer_files\env\lib\site-packages\bitsandbytes\cextension.py:31: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
The following models are available:

1. alpaca-13b-lora-int4
2. ggml-alpaca-7b-q4.bin
3. OPT-13B-Erebus-4bit-128g

Which one do you want to load? 1-3

3

Loading OPT-13B-Erebus-4bit-128g...
CUDA extension not installed.
Found the following quantized model: models\OPT-13B-Erebus-4bit-128g\OPT-13B-Erebus-4bit-128g.safetensors
Loading model ...
E:\Games\oobabooga-windows\installer_files\env\lib\site-packages\safetensors\torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
E:\Games\oobabooga-windows\installer_files\env\lib\site-packages\torch\_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
E:\Games\oobabooga-windows\installer_files\env\lib\site-packages\torch\storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
Done.
Loaded the model in 186.23 seconds.
Loading the extension "gallery"... Ok.
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "E:\Games\oobabooga-windows\text-generation-webui\modules\callbacks.py", line 66, in gentask
    ret = self.mfunc(callback=_callback, **self.kwargs)
  File "E:\Games\oobabooga-windows\text-generation-webui\modules\text_generation.py", line 245, in generate_with_callback
    shared.model.generate(**kwargs)
  File "E:\Games\oobabooga-windows\installer_files\env\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "E:\Games\oobabooga-windows\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 1485, in generate
    return self.sample(
  File "E:\Games\oobabooga-windows\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 2524, in sample
    outputs = self(
  File "E:\Games\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\Games\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\opt\modeling_opt.py", line 938, in forward
    outputs = self.model.decoder(
  File "E:\Games\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\Games\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\opt\modeling_opt.py", line 704, in forward
    layer_outputs = decoder_layer(
  File "E:\Games\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\Games\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\opt\modeling_opt.py", line 326, in forward
    hidden_states = self.self_attn_layer_norm(hidden_states)
  File "E:\Games\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\Games\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\normalization.py", line 190, in forward
    return F.layer_norm(
  File "E:\Games\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\functional.py", line 2515, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'
Output generated in 2.74 seconds (0.00 tokens/s, 0 tokens, context 39, seed 1164989743)
Traceback (most recent call last):
  File "E:\Games\oobabooga-windows\text-generation-webui\modules\callbacks.py", line 66, in gentask
    ret = self.mfunc(callback=_callback, **self.kwargs)
  File "E:\Games\oobabooga-windows\text-generation-webui\modules\text_generation.py", line 245, in generate_with_callback
    shared.model.generate(**kwargs)
  File "E:\Games\oobabooga-windows\installer_files\env\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "E:\Games\oobabooga-windows\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 1485, in generate
    return self.sample(
  File "E:\Games\oobabooga-windows\installer_files\env\lib\site-packages\transformers\generation\utils.py", line 2524, in sample
    outputs = self(
  File "E:\Games\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\Games\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\opt\modeling_opt.py", line 938, in forward
    outputs = self.model.decoder(
  File "E:\Games\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\Games\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\opt\modeling_opt.py", line 704, in forward
    layer_outputs = decoder_layer(
  File "E:\Games\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\Games\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\opt\modeling_opt.py", line 326, in forward
    hidden_states = self.self_attn_layer_norm(hidden_states)
  File "E:\Games\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "E:\Games\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\modules\normalization.py", line 190, in forward
    return F.layer_norm(
  File "E:\Games\oobabooga-windows\installer_files\env\lib\site-packages\torch\nn\functional.py", line 2515, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'
Output generated in 0.42 seconds (0.00 tokens/s, 0 tokens, context 39, seed 1514397934)

System Info

OS Name Microsoft Windows 11 Pro
Version 10.0.22621 Build 22621
Other OS Description    Not Available
OS Manufacturer Microsoft Corporation
System Name JAKE
System Manufacturer ASUS
System Model    System Product Name
System Type x64-based PC
System SKU  SKU
Processor   AMD Ryzen 5 5600G with Radeon Graphics, 3901 Mhz, 6 Core(s), 12 Logical Processor(s)
BIOS Version/Date   American Megatrends Inc. 2006, 20/03/2021
SMBIOS Version  3.3
Embedded Controller Version 255.255
BIOS Mode   UEFI
BaseBoard Manufacturer  ASUSTeK COMPUTER INC.
BaseBoard Product   PRIME A520M-K
BaseBoard Version   Rev X.0x
Platform Role   Desktop
Secure Boot State   On
PCR7 Configuration  Elevation Required to View
Windows Directory   C:\WINDOWS
System Directory    C:\WINDOWS\system32
Boot Device \Device\HarddiskVolume1
Locale  United States
Hardware Abstraction Layer  Version = "10.0.22621.1413"
User Name   Jake
Time Zone   Malay Peninsula Standard Time
Installed Physical Memory (RAM) 16.0 GB
Total Physical Memory   15.3 GB
Available Physical Memory   11.2 GB
Total Virtual Memory    30.6 GB
Available Virtual Memory    16.0 GB
Page File Space 15.3 GB
Page File   C:\pagefile.sys
Kernel DMA Protection   Off
Virtualization-based security   Not enabled
Windows Defender Application Control policy Enforced
Windows Defender Application Control user mode policy   Off
Device Encryption Support   Elevation Required to View
Hyper-V - VM Monitor Mode Extensions    Yes
Hyper-V - Second Level Address Translation Extensions   Yes
Hyper-V - Virtualization Enabled in Firmware    Yes
Hyper-V - Data Execution Protection Yes

Gpu: igpu/AMD Radeon(TM) Graphics
Ph0rk0z commented 1 year ago

How did they encode this model? Did they use act order + group size and you are trying to use it on cuda kernel?

oh.. I see it used the 0cc4m gptq.. it works on my fork but very very sloooooow.. autograd fails on half/float error :(

Erika-wby commented 1 year ago

I get the same error with a pygmalion model. Also a safetensor if that matters. image

FrankDMartinez commented 1 year ago

I get a similar one for the facebook/galactica-125m on an Intel Mac.

Ph0rk0z commented 1 year ago

Maybe edit the config and try removing "torch_dtype": "float16". See if anything helps to change it from false to true or true to false.

After setting "use_cache": true, I finally get usable output from this model but only using the https://github.com/johnsmith0031/alpaca_lora_4bit inference. Regular GPTQ is 1/2 that speed, under 1it/s, even with no context.

Output generated in 11.76 seconds (2.21 tokens/s, 26 tokens, context 611, seed 492669332)

github-actions[bot] commented 1 year ago

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.