Open zhoumengbo opened 10 months ago
When fine-tuning Mistral with LoRA, do you think FlashAttention2 helps in speeding up the process? If yes, how significant is the acceleration? Where is the primary acceleration achieved?
Hi @zhoumengbo I don't recall if we benchmarked speed with FA2 and LoRA, but I do know that it's crucial in order to bring the vRAM usage down.
When fine-tuning Mistral with LoRA, do you think FlashAttention2 helps in speeding up the process? If yes, how significant is the acceleration? Where is the primary acceleration achieved?