johnsmith0031 / alpaca_lora_4bit

MIT License
533 stars 84 forks source link

Flash Attention 2 #138

Closed ghost closed 1 year ago

ghost commented 1 year ago

@johnsmith0031 Hey have you seen the new version? Its got a huge speed boost and LINEAR attention :))))

FlashAttention-2 currently supports:

Ampere, Ada, or Hopper GPUs (e.g., A100, RTX 3090, RTX 4090, H100). Support for Turing GPUs (T4, RTX 2080) is coming soon, please use FlashAttention 1.x for Turing GPUs for now. Datatype fp16 and bf16 (bf16 requires Ampere, Ada, or Hopper GPUs). All head dimensions up to 256. Head dim > 192 backward requires A100/A800 or H100/H800.

https://github.com/Dao-AILab/flash-attention#installation-and-features

johnsmith0031 commented 1 year ago

Cool! Thanks for the information!