princeton-nlp / LLM-Shearing

[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
https://arxiv.org/abs/2310.06694
MIT License
533 stars 39 forks source link

Flash-attn dependency issues #27

Closed Forival closed 9 months ago

Forival commented 9 months ago

You said that Flash Attention version 2 is not currently supported and may require manual modifications to the model file. but In requirement.txt, you still have flash-attn2 installed. Does this cause some incompatibility issues?

xiamengzhou commented 9 months ago

Thanks for catching this! It should be removed. The file is updated!