Closed yuemengrui closed 5 months ago
Hi @yuemengrui,
I haven't closely examined this error, but from a preliminary analysis, I suspect it might be due to differences in the implementation of attention_mask
in ChatGLM3. You might try commenting out https://github.com/microsoft/LLMLingua/blob/main/llmlingua/prompt_compressor.py#L133 to see if that resolves the issue.
File "/Users/yuemengrui/.cache/huggingface/modules/transformers_modules/chatglm3-6b-32k/modeling_chatglm.py", line 822, in forward full_attention_mask = self.get_masks(input_ids, past_key_values, padding_mask=attention_mask) File "/Users/yuemengrui/.cache/huggingface/modules/transformers_modules/chatglm3-6b-32k/modeling_chatglm.py", line 691, in get_masks full_attention_mask = full_attention_mask * padding_mask.unsqueeze(1) RuntimeError: The size of tensor a (909) must match the size of tensor b (534) at non-singleton dimension 2