LoRA + FlashAttention2 speed up？

huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences

https://huggingface.co/HuggingFaceH4

Apache License 2.0

4.45k stars 385 forks source link

LoRA + FlashAttention2 speed up？ #15

Open zhoumengbo opened 10 months ago

zhoumengbo commented 10 months ago

When fine-tuning Mistral with LoRA, do you think FlashAttention2 helps in speeding up the process? If yes, how significant is the acceleration? Where is the primary acceleration achieved?

lewtun commented 10 months ago

Hi @zhoumengbo I don't recall if we benchmarked speed with FA2 and LoRA, but I do know that it's crucial in order to bring the vRAM usage down.