mit-han-lab / streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
https://arxiv.org/abs/2309.17453
MIT License
6.59k stars 361 forks source link

TypeError: llama_pos_shift_attention_forward() got an unexpected keyword argument 'padding_mask' #14

Closed MartinKratochvilProgramy closed 1 year ago

MartinKratochvilProgramy commented 1 year ago

I'm trying to reproduce the work per readme but am getting "TypeError: llama_pos_shift_attention_forward() got an unexpected keyword argument 'padding_mask'" on line 103 in run_streaming_llama.py. I have no idea how to debug this as there is no padding_mask in the code.

tomaarsen commented 1 year ago

You must downgrade transformers to 4.33.0:

pip install transformers==4.33.0
alexcannan commented 1 year ago

I simply added a padding_mask parameter to the streaming_llm/pos_shift/modify_llama.llama_pos_shift_attention_forward function and it works as expected, probably that's all that's needed for compatibility with the latest transformers package.

Only guessed on the typing:

def llama_pos_shift_attention_forward(
    self,
    hidden_states: torch.Tensor,
    attention_mask: Optional[torch.Tensor] = None,
    position_ids: Optional[torch.LongTensor] = None,
    past_key_value: Optional[Tuple[torch.Tensor]] = None,
    output_attentions: bool = False,
    use_cache: bool = False,
    padding_mask: Optional[torch.Tensor] = None,
) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]:
tomaarsen commented 1 year ago

That's correct. The parameter isn't actually used, it just has to exist. Downgrading transformers also works.

Guangxuan-Xiao commented 1 year ago

Thank you, @tomaarsen and @alexcannan! I've added transformers version specification in README.