error on chatglm3-6b-32k

microsoft / LLMLingua

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

MIT License

4.27k stars 228 forks source link

File "/Users/yuemengrui/.cache/huggingface/modules/transformers_modules/chatglm3-6b-32k/modeling_chatglm.py", line 822, in forward full_attention_mask = self.get_masks(input_ids, past_key_values, padding_mask=attention_mask) File "/Users/yuemengrui/.cache/huggingface/modules/transformers_modules/chatglm3-6b-32k/modeling_chatglm.py", line 691, in get_masks full_attention_mask = full_attention_mask * padding_mask.unsqueeze(1) RuntimeError: The size of tensor a (909) must match the size of tensor b (534) at non-singleton dimension 2

microsoft / LLMLingua

error on chatglm3-6b-32k #82