RuntimeError: shape '[1, 5, 8, 128]' is invalid for input of size 10240 LLama 405B 4-bit on Layer 1

TitleOS commented 2 months ago

System Specs: Ryzen 5600G Nvidia Tesla M40 24GB 128GB DDR4 RAM

Error:

running layers(cuda:0):   1%|▍                                                         | 1/129 [00:06<14:44,  6.91s/it]
Traceback (most recent call last):
  File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\inference_405B_4bit.py", line 14, in <module>
    generation_output = model.generate(
                        ^^^^^^^^^^^^^^^
  File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\transformers\generation\utils.py", line 2024, in generate
    result = self._sample(
             ^^^^^^^^^^^^^
  File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\transformers\generation\utils.py", line 2982, in _sample
    outputs = self(**model_inputs, return_dict=True)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\airllm\airllm_base.py", line 369, in __call__
    return self.forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\airllm\airllm_base.py", line 569, in forward
    new_seq = layer(seq, **kwargs)[0]
              ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 734, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
                                                          ^^^^^^^^^^^^^^^
  File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Darkl\OneDrive\source\repos\NeuralSlice\NeuralSliceEnv\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 622, in forward
    key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape '[1, 5, 8, 128]' is invalid for input of size 10240

My inference code:

from airllm import AutoModel

model = AutoModel.from_pretrained("unsloth/Meta-Llama-3.1-405B-Instruct-bnb-4bit", delete_original=True)

input_text = input("Prompt the all mightly 405B Llama: ")

input_tokens = model.tokenizer(input_text,
      return_tensors="pt", 
      return_attention_mask=False, 
      truncation=True, 
      max_length=128, 
      padding=False)

generation_output = model.generate(
      input_tokens['input_ids'].cuda(), 
      max_new_tokens=10,
      return_dict_in_generate=True)

output = model.tokenizer.decode(generation_output.sequences[0])

print(output)

PIP List:

(NeuralSliceEnv) C:\Users\Darkl\OneDrive\source\repos\NeuralSlice>pip list
Package            Version
------------------ ------------
accelerate         0.33.0
aiohappyeyeballs   2.4.0
aiohttp            3.10.5
aiosignal          1.3.1
airllm             2.10.2
attrs              24.2.0
bitsandbytes       0.43.3
certifi            2024.7.4
charset-normalizer 3.3.2
colorama           0.4.6
coloredlogs        15.0.1
datasets           2.21.0
dill               0.3.8
filelock           3.15.4
frozenlist         1.4.1
fsspec             2024.6.1
huggingface-hub    0.24.6
humanfriendly      10.0
idna               3.8
Jinja2             3.1.4
MarkupSafe         2.1.5
mpmath             1.3.0
multidict          6.0.5
multiprocess       0.70.16
networkx           3.3
numpy              1.26.4
optimum            1.21.4
packaging          24.1
pandas             2.2.2
pillow             10.2.0
pip                24.2
protobuf           5.27.3
psutil             6.0.0
pyarrow            17.0.0
pyreadline3        3.4.1
python-dateutil    2.9.0.post0
pytz               2024.1
PyYAML             6.0.2
regex              2024.7.24
requests           2.32.3
safetensors        0.4.4
scipy              1.14.1
sentencepiece      0.2.0
setuptools         65.5.0
six                1.16.0
sympy              1.13.2
tokenizers         0.19.1
torch              2.4.0+cu121
torchaudio         2.4.0+cu121
torchvision        0.19.0+cu121
tqdm               4.66.5
transformers       4.44.2
typing_extensions  4.12.2
tzdata             2024.1
urllib3            2.2.2
xxhash             3.5.0
yarl               1.9.4

MahdiPresario001 commented 2 months ago

same error, anyone had any luck?

beleon commented 2 months ago

I posted a potential quickfix on the other issue here

TitleOS commented 2 months ago

I posted a potential quickfix on the other issue here

I can confirm this fix worked for me, and I was able to inference with LLaMA 405B 4 bit on my setup. Thank you!

lyogavin / airllm

RuntimeError: shape '[1, 5, 8, 128]' is invalid for input of size 10240 LLama 405B 4-bit on Layer 1 #178