When (not) to use Flash Attention?

tlc4418 / llm_optimization

A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.

https://arxiv.org/abs/2310.02743

MIT License

26 stars 1 forks source link

When (not) to use Flash Attention? #8

Closed RylanSchaeffer closed 1 month ago

RylanSchaeffer commented 2 months ago

Some of the configs explicitly do not use flash attention. For instance, in config_rl.yaml, the pythia_44m_rlhf_ensemble sets use_flash_attention to false.

When is using flash attention (in)appropriate?

tlc4418 commented 1 month ago

For this I kept the OpenAssitant defaults. I don't think it makes much of a difference, though you may be able to see some speedup by using flash attention more. I'd refer you to this PR where they merged flash-attention support: https://github.com/LAION-AI/Open-Assistant/pull/2033