[Performance]: About the use of flash_attn_varlen_func()

Proposal to improve performance

No response

Report of performance regression

No response

Misc discussion on performance

Hi, vllm developers,

I read the code and found the use of flash attention. This algorithm used in vllm likely for the sake of conducting the pre-filling stage more quickly. Am i right in thinking so? BTW, the vllm code used flash_attn_varlen_func(), instead of other implementations of FA, e.g., lash_attn_func, flash_attn_kvpacked_func, flash_attn_qkvpacked_func, flash_attn_varlen_kvpacked_func, flash_attn_varlen_qkvpacked_func, and flash_attn_with_kvcache. Could you share with me the consideration made for this selection? Is it selected for its speed better than other implementations as well? Is there a remarked difference between it and other implementations in the situation of vllm?

Thanks.

Your current environment (if you think it is necessary)

The output of `python collect_env.py`

vllm-project / vllm