princeton-nlp / CEPE

[ACL 2024] Long-Context Language Modeling with Parallel Encodings
https://arxiv.org/abs/2402.16617
MIT License
117 stars 8 forks source link

An Issue on Reproducing Streamingllm #3

Open Ocean-627 opened 1 month ago

Ocean-627 commented 1 month ago

Congratulations on your excellent work! I attempted to run bash scripts/run_streamingllm_lm.sh to reproduce the results of streaming_llm, but I encountered the following error:

TypeError: llama_pos_shift_attention_forward() got an unexpected keyword argument 'padding_mask'

It seems that the transformers library version needs to be below 4.34.0, but this project requires version 4.34.1. Have you encountered the same issue? If so, do you have any solutions? Thank you very much!

howard-yen commented 1 month ago

Thank you for your interest in our work :) I did run into a similar issue, apologies for not including the fixes in the original repo. To fix this issue, I applied a patch on top of the original streaming_llm repo. To apply the fix:

  1. clone the original repo
  2. replace the file streaming_llm/pos_shift/modify_llama.py with this file; the only changes I made to this file is adding the padding_mask argument to the pos_shift forward function, and also added supported for FlashAttention2.
  3. Install this version of the streaming_llm from source via python setup.py develop in the base directory. Alternatively, you can also just add this new version to your path (in case you don't want to go through the installation process).

This should work with transformers==4.34.1, which is what I tested it on. There are other libraries (e.g., AttentionSink) that also implement this, but I cannot speak to how they perform compared to the original repo. If you do get a chance to check them out, please let me know how they go.

Please let me know if you have any other questions :)