Closed RylanSchaeffer closed 1 month ago
For this I kept the OpenAssitant defaults. I don't think it makes much of a difference, though you may be able to see some speedup by using flash attention more. I'd refer you to this PR where they merged flash-attention support: https://github.com/LAION-AI/Open-Assistant/pull/2033
Some of the configs explicitly do not use flash attention. For instance, in
config_rl.yaml
, thepythia_44m_rlhf_ensemble
setsuse_flash_attention
to false.When is using flash attention (in)appropriate?