turboderp / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
MIT License
2.66k stars 214 forks source link

Occasionally RuntimeError #307

Open leegohi04517 opened 7 months ago

leegohi04517 commented 7 months ago

Traceback (most recent call last): File "/home/ubuntu/text-generation-webui/modules/callbacks.py", line 56, in gentask ret = self.mfunc(callback=_callback, args, self.kwargs) File "/home/ubuntu/text-generation-webui/modules/text_generation.py", line 361, in generate_with_callback shared.model.generate(kwargs) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, **kwargs) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/generation/utils.py", line 1652, in generate return self.sample( File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/generation/utils.py", line 2734, in sample outputs = self( File "/home/ubuntu/text-generation-webui/modules/exllama_hf.py", line 96, in call self.ex_model.forward(seq_tensor[longest_prefix:-1].view(1, -1), ex_cache, preprocess_only=True, lora=self.lora) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama/model.py", line 972, in forward r = self._forward(input_ids[:, chunk_begin : chunk_end], File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama/model.py", line 1058, in _forward hidden_states = decoder_layer.forward(hidden_states, cache, buffers[device], lora) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama/model.py", line 536, in forward hidden_states = self.self_attn.forward(hidden_states, cache, buffer, lora) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/exllama/model.py", line 491, in forward attn_output = F.scaled_dot_product_attention(query_states, key_states, value_states, attn_mask = buffer.attn_mask, is_causal = False) RuntimeError: The expanded size of the tensor (641) must match the existing size (640) at non-singleton dimension 3. Target sizes: [1, 40, 621, 641]. Tensor sizes: [1, 1, 621, 640]

text-generation-webui v1.7 with exllama-0.0.17. The same prompt and SampleParam sometimes produce error and sometimes work fine.